Explanipedia

Konooz: Multi-domain Multi-dialect Corpus for Named Entity Recognition Open

Nagham Hamad, Mohammed Khalilia, Mustafa Jarrar · 2025

We introduce Konooz, a novel multi-dimensional corpus covering 16 Arabic dialects across 10 domains, resulting in 160 distinct corpora. The corpus comprises about 777k tokens, carefully collected and manually annotated with 21 entity types…

Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset Open

Fakhraddin Alwajih, Samar M. Magdy, Abdellah El Mekki, Omer Nacar, Youssef Nafea , et al. · 2025

Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explic…

Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs Open

Fakhraddin Alwajih, Abdellah El Mekki, Samar M. Magdy, AbdelRahim Elmadany, Omer Nacar , et al. · 2025

As large language models (LLMs) become increasingly integrated into daily life, ensuring their cultural sensitivity and inclusivity is paramount. We introduce our dataset, a year-long community-driven project covering all 22 Arab countries…

Konooz: Multi-domain Multi-dialect Corpus for Named Entity Recognition Open

Nagham Hamad, Mohammed Khalilia, Mustafa Jarrar · 2025

ImageEval 2025: The First Arabic Image Captioning Shared Task Open

Ahlam Bashiti, Alaa Aljabari, Hadi Hamoud, Md. Rafiul Biswas, Bilal Mohammed Shalash , et al. · 2025

WojoodOntology: Ontology-Driven LLM Prompting for Unified Information Extraction Tasks Open

Alaa Aljabari, Nagham Hamad, Mohammed Khalilia, Mustafa Jarrar · 2025

Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs Open

Fakhraddin Alwajih, Abdellah El Mekki, Samar M. Magdy, AbdelRahim Elmadany, Omer Nacar , et al. · 2025

WojoodRelations: Arabic Relation Extraction Corpus and Modeling Open

Alaa Aljabari, Mohammed Khalilia, Mustafa Jarrar · 2025

The AraGenEval Shared Task on Arabic Authorship Style Transfer and AI Generated Text Detection Open

Shadi Abudalfa, Saad Ezzini, Ahmed Abdelalí, Hamza Alami, Abdessamad Benlahbib , et al. · 2025

NADI 2025: The First Multidialectal Arabic Speech Processing Shared Task Open

Bashar Talafha, Hawau Olamide Toyin, Peter Sullivan, AbdelRahim Elmadany, Abdurrahman Juma , et al. · 2025

Active Learning for Multidialectal Arabic POS Tagging Open

Diyam Akra, Mohammed Khalilia, Mustafa Jarrar · 2025

Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset Open

Fakhraddin Alwajih, Samar M. Magdy, Abdellah El Mekki, Omer Nacar, Youssef Nafea , et al. · 2025

SinaTools: Open Source Toolkit for Arabic Natural Language Processing Open

Tymaa Hammouda, Mustafa Jarrar, Mohammed Khalilia · 2024

We introduce SinaTools, an open-source Python package for Arabic natural language processing and understanding. SinaTools is a unified package allowing people to integrate it into their system workflow, offering solutions for various tasks…

Casablanca: Data and Models for Multidialectal Arabic Speech Recognition Open

Bashar Talafha, Karima Kadaoui, Samar M. Magdy, Mariem Habiboullah, Chafei Mohamed Chafei , et al. · 2024

In spite of the recent progress in speech processing, the majority of world languages and dialects remain uncovered. This situation only furthers an already wide technological divide, thereby hindering technological and socioeconomic inclu…

ArabicNLU 2024: The First Arabic Natural Language Understanding Shared Task Open

Mohammed Khalilia, Sanad Malaysha, Reem Suwaileh, Mustafa Jarrar, Alaa Aljabari , et al. · 2024

This paper presents an overview of the Arabic Natural Language Understanding (ArabicNLU 2024) shared task, focusing on two subtasks: Word Sense Disambiguation (WSD) and Location Mention Disambiguation (LMD). The task aimed to evaluate the …

Event-Arguments Extraction Corpus and Modeling using BERT for Arabic Open

Alaa Aljabari, Lina Duaibes, Mustafa Jarrar, Mohammed Khalilia · 2024

Event-argument extraction is a challenging task, particularly in Arabic due to sparse linguistic resources. To fill this gap, we introduce the \hadath corpus ($550$k tokens) as an extension of Wojood, enriched with event-argument annotatio…

The FIGNEWS Shared Task on News Media Narratives Open

Wajdi Zaghouani, Mustafa Jarrar, Nizar Habash, Houda Bouamor, Imed Zitouni , et al. · 2024

We present an overview of the FIGNEWS shared task, organized as part of the ArabicNLP 2024 conference co-located with ACL 2024. The shared task addresses bias and propaganda annotation in multilingual news posts. We focus on the early days…

AraFinNLP 2024: The First Arabic Financial NLP Shared Task Open

Sanad Malaysha, Mo El-Haj, Saad Ezzini, Mohammed Khalilia, Mustafa Jarrar , et al. · 2024

The expanding financial markets of the Arab world require sophisticated Arabic NLP tools. To address this need within the banking domain, the Arabic Financial NLP (AraFinNLP) shared task proposes two subtasks: (i) Multi-dialect Intent Dete…

WojoodNER 2024: The Second Arabic Named Entity Recognition Shared Task Open

Mustafa Jarrar, Nagham Hamad, Mohammed Khalilia, Bashar Talafha, AbdelRahim Elmadany , et al. · 2024

We present WojoodNER-2024, the second Arabic Named Entity Recognition (NER) Shared Task. In WojoodNER-2024, we focus on fine-grained Arabic NER. We provided participants with a new Arabic fine-grained NER dataset called wojoodfine, annotat…

Sina at FigNews 2024: Multilingual Datasets Annotated with Bias and Propaganda Open

Lina Duaibes, Areej Jaber, Mustafa Jarrar, Ahmad Qadi, Mais Qandeel · 2024

The proliferation of bias and propaganda on social media is an increasingly significant concern, leading to the development of techniques for automatic detection. This article presents a multilingual corpus of 12, 000 Facebook posts fully …

Are Large Language Models the New Interface for Data Pipelines? Open

Sylvio Barbon, Paolo Ceravolo, Sven Groppe, Mustafa Jarrar, Samira Maghool , et al. · 2024

A Language Model is a term that encompasses various types of models designed to understand and generate human communication. Large Language Models (LLMs) have gained significant attention due to their ability to process text with human-lik…

Qabas: An Open-Source Arabic Lexicographic Database Open

Mustafa Jarrar, Tymaa Hammouda · 2024

We present Qabas, a novel open-source Arabic lexicon designed for NLP applications. The novelty of Qabas lies in its synthesis of 110 lexicons. Specifically, Qabas lexical entries (lemmas) are assembled by linking lemmas from 110 lexicons.…

NLU-STR at SemEval-2024 Task 1: Generative-based Augmentation and Encoder-based Scoring for Semantic Textual Relatedness Open

Sanad Malaysha, Mustafa Jarrar, Mohammed Khalilia · 2024

Semantic textual relatedness is a broader concept of semantic similarity. It measures the extent to which two chunks of text convey similar meaning or topics, or share related concepts or contexts. This notion of relatedness can be applied…

SinaTools: Open Source Toolkit for Arabic Natural Language Processing Open

Tymaa Hammouda, Mustafa Jarrar, Mohammed Khalilia · 2024

Alma: Fast Lemmatizer and POS Tagger for Arabic Open

Mustafa Jarrar, Diyam Akra, Tymaa Hammouda · 2024

ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic Open

Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, Sana Ghanem · 2023

This paper presents the ArBanking77, a large Arabic dataset for intent detection in the banking domain. Our dataset was arabized and localized from the original English Banking77 dataset, which consists of 13,083 queries to ArBanking77 dat…

SALMA: Arabic Sense-Annotated Corpus and WSD Benchmarks Open

Mustafa Jarrar, Sanad Malaysha, Tymaa Hammouda, Mohammed Khalilia · 2023

SALMA, the first Arabic sense-annotated corpus, consists of ~34K tokens, which are all sense-annotated. The corpus is annotated using two different sense inventories simultaneously (Modern and Ghani). SALMA novelty lies in how tokens and s…

Arabic Fine-Grained Entity Recognition Open

Haneen Liqreina, Mustafa Jarrar, Mohammed Khalilia, Ahmed Oumar El-Shangiti, Muhammad Abdul-Mageed · 2023

Traditional NER systems are typically trained to recognize coarse-grained entities, and less attention is given to classifying entities into a hierarchy of fine-grained lower-level subtypes. This article aims to advance Arabic NER with fin…

Nabra: Syrian Arabic Dialects with Morphological Annotations Open

Amal Nayouf, Tymaa Hammouda, Mustafa Jarrar, Fadi A. Zaraket, Mohamad-Bassam Kurdy · 2023

This paper presents Nabra, a corpora of Syrian Arabic dialects with morphological annotations. A team of Syrian natives collected more than 6K sentences containing about 60K words from several sources including social media posts, scripts …

WojoodNER 2023: The First Arabic Named Entity Recognition Shared Task Open

Mustafa Jarrar, Muhammad Abdul-Mageed, Mohammed Khalilia, Bashar Talafha, AbdelRahim Elmadany , et al. · 2023

We present WojoodNER-2023, the first Arabic Named Entity Recognition (NER) Shared Task. The primary focus of WojoodNER-2023 is on Arabic NER, offering novel NER datasets (i.e., Wojood) and the definition of subtasks designed to facilitate …

Mustafa Jarrar YOU? Author Swipe