Explanipedia

EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMs Open

Numaan Naeem, Abdellah El Mekki, Muhammad Abdul-Mageed · 2025

Large language models (LLMs) are transforming education by answering questions, explaining complex concepts, and generating content across a wide range of subjects. Despite strong performance on academic benchmarks, they often fail to tail…

PalmX 2025: The First Shared Task on Benchmarking LLMs on Arabic and Islamic Culture Open

Fakhraddin Alwajih, Abdellah El Mekki, Asmaa Mohamed · 2025

Large Language Models (LLMs) inherently reflect the vast data distributions they encounter during their pre-training phase. As this data is predominantly sourced from the web, there is a high chance it will be skewed towards high-resourced…

Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset Open

Fakhraddin Alwajih, Samar M. Magdy, Abdellah El Mekki, Omer Nacar, Youssef Nafea , et al. · 2025

Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explic…

NileChat: Towards Linguistically Diverse and Culturally Aware LLMs for Local Communities Open

Abdellah El Mekki, Houdaifa Atou, Shady Shehata, Muhammad Abdul-Mageed · 2025

Enhancing the linguistic capabilities of Large Language Models (LLMs) to include low-resource languages is a critical research area. Current research directions predominantly rely on synthetic data generated by translating English corpora,…

Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs Open

Fakhraddin Alwajih, Abdellah El Mekki, Samar M. Magdy, AbdelRahim Elmadany, Omer Nacar , et al. · 2025

As large language models (LLMs) become increasingly integrated into daily life, ensuring their cultural sensitivity and inclusivity is paramount. We introduce our dataset, a year-long community-driven project covering all 22 Arab countries…

Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs Open

Abdellah El Mekki, Muhammad Abdul-Mageed · 2025

PalmX 2025: The First Shared Task on Benchmarking LLMs on Arabic and Islamic Culture Open

Fakhraddin Alwajih, Abdellah El Mekki, Hamdy Mubarak, Majd Hawasly, Asmaa Mohamed , et al. · 2025

Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs Open

Fakhraddin Alwajih, Abdellah El Mekki, Samar M. Magdy, AbdelRahim Elmadany, Omer Nacar , et al. · 2025

NileChat: Towards Linguistically Diverse and Culturally Aware LLMs for Local Communities Open

Abdellah El Mekki, Houdaifa Atou, Omer Nacar, Shady Shehata, Muhammad Abdul-Mageed · 2025

EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMs Open

Numaan Naeem, Abdellah El Mekki, Muhammad Abdul-Mageed · 2025

Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset Open

Fakhraddin Alwajih, Samar M. Magdy, Abdellah El Mekki, Omer Nacar, Youssef Nafea , et al. · 2025

Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks Open

Gagan Bhatia, El Moatez Billah Nagoudi, Abdellah El Mekki, Fakhraddin Alwajih, Muhammad Abdul-Mageed · 2025

Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks Open

Gagan Bhatia, El Moatez Billah Nagoudi, Abdellah El Mekki, Fakhraddin Alwajih, Muhammad Abdul-Mageed · 2024

We introduce {\bf Swan}, a family of embedding models centred around the Arabic language, addressing both small-scale and large-scale use cases. Swan includes two variants: Swan-Small, based on ARBERTv2, and Swan-Large, built on ArMistral,…

Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs Open

Abdellah El Mekki, Muhammad Abdul-Mageed · 2024

Large Language Models (LLMs) have demonstrated impressive performance on a wide range of natural language processing (NLP) tasks, primarily through in-context learning (ICL). In ICL, the LLM is provided with examples that represent a given…

Casablanca: Data and Models for Multidialectal Arabic Speech Recognition Open

Bashar Talafha, Karima Kadaoui, Samar M. Magdy, Mariem Habiboullah, Chafei Mohamed Chafei , et al. · 2024

In spite of the recent progress in speech processing, the majority of world languages and dialects remain uncovered. This situation only furthers an already wide technological divide, thereby hindering technological and socioeconomic inclu…

ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting Open

Abdellah El Mekki, Muhammad Abdul-Mageed, ElMoatez Billah Nagoudi, Ismaïl Berrada, Ahmed Khoumsi · 2023

Bilingual Lexicon Induction (BLI), where words are translated between two languages, is an important NLP task. While noticeable progress on BLI in rich resource languages using static word embeddings has been achieved. The word translation…

ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting Open

Abdellah El Mekki, Muhammad Abdul-Mageed, ElMoatez Billah Nagoudi, Ismaïl Berrada, Ahmed Khoumsi · 2023

Abdellah El Mekki, Muhammad Abdul-Mageed, ElMoatez Billah Nagoudi, Ismail Berrada, Ahmed Khoumsi. Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of t…

CS-UM6P at SemEval-2022 Task 6: Transformer-based Models for Intended Sarcasm Detection in English and Arabic Open

Abdelkader El Mahdaouy, Abdellah El Mekki, Kabil Essefar, Abderrahman Skiredj, Ismaïl Berrada · 2022

Sarcasm is a form of figurative language where the intended meaning of a sentence differs from its literal meaning. This poses a serious challenge to several Natural Language Processing (NLP) applications such as Sentiment Analysis, Opinio…

Deep Multi-Task Models for Misogyny Identification and Categorization on Arabic Social Media Open

Abdelkader El Mahdaouy, Abdellah El Mekki, Ahmed Oumar, Hajar Mousannif, Ismaïl Berrada · 2022

The prevalence of toxic content on social media platforms, such as hate speech, offensive language, and misogyny, presents serious challenges to our interconnected society. These challenging issues have attracted widespread attention in Na…

UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed Complex Named Entity Recognition via Pseudo Labels using Multilingual Transformer Open

Abdellah El Mekki, Abdelkader El Mahdaouy, Mohammed Akallouch, Ismaïl Berrada, Ahmed Khoumsi · 2022

Building real-world complex Named Entity Recognition (NER) systems is a challenging task. This is due to the complexity and ambiguity of named entities that appear in various contexts such as short input sentences, emerging entities, and c…

CS-UM6P at SemEval-2022 Task 6: Transformer-based Models for Intended Sarcasm Detection in English and Arabic Open

Abdelkader El Mahdaouy, Abdellah El Mekki, Kabil Essefar, Abderrahman Skiredj, Ismaïl Berrada · 2022

Sarcasm is a form of figurative language where the intended meaning of a sentence differs from its literal meaning. This poses a serious challenge to several Natural Language Processing (NLP) applications such as Sentiment Analysis, Opinio…

UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed Complex Named Entity Recognition via Pseudo Labels using Multilingual Transformer Open

Abdellah El Mekki, Abdelkader El Mahdaouy, Mohammed Akallouch, Ismaïl Berrada, Ahmed Khoumsi · 2022

Building real-world complex Named Entity Recognition (NER) systems is a challenging task. This is due to the complexity and ambiguity of named entities that appear in various contexts such as short input sentences, emerging entities, and c…

BERT-based Multi-Task Model for Country and Province Level Modern\n Standard Arabic and Dialectal Arabic Identification Open

Abdellah El Mekki, Abdelkader El Mahdaouy, Kabil Essefar, Nabil El Mamoun, Ismaïl Berrada , et al. · 2021

Dialect and standard language identification are crucial tasks for many\nArabic natural language processing applications. In this paper, we present our\ndeep learning-based system, submitted to the second NADI shared task for\ncountry-leve…

BERT-based Multi-Task Model for Country and Province Level Modern Standard Arabic and Dialectal Arabic Identification Open

Abdellah El Mekki, Abdelkader El Mahdaouy, Kabil Essefar, Nabil El Mamoun, Ismaïl Berrada , et al. · 2021

Dialect and standard language identification are crucial tasks for many Arabic natural language processing applications. In this paper, we present our deep learning-based system, submitted to the second NADI shared task for country-level a…

Deep Multi-Task Model for Sarcasm Detection and Sentiment Analysis in Arabic Language Open

Abdelkader El Mahdaouy, Abdellah El Mekki, Kabil Essefar, Nabil El Mamoun, Ismaïl Berrada , et al. · 2021

The prominence of figurative language devices, such as sarcasm and irony, poses serious challenges for Arabic Sentiment Analysis (SA). While previous research works tackle SA and sarcasm detection separately, this paper introduces an end-t…

On the Role of Orthographic Variations in Building Multidialectal Arabic Word Embeddings Open

Abdellah El Mekki, Abdelkader El Mahdaouy, Ismaïl Berrada, Ahmed Khoumsi · 2021

Dialectal Arabic (DA) is mostly used by over 400 million people across Arab countries as a communication channel on social media platforms, web forums, and daily life. Building Natural Language Processing systems for each DA variant is a c…

An open access NLP dataset for Arabic dialects : Data collection, labeling, and model construction Open

ElMehdi Boujou, Hamza Chataoui, Abdellah El Mekki, S Benjelloun, Ikram Chairi , et al. · 2021

Natural Language Processing (NLP) is today a very active field of research and innovation. Many applications need however big sets of data for supervised learning, suitably labelled for the training purpose. This includes applications for …

Domain Adaptation for Arabic Cross-Domain and Cross-Dialect Sentiment Analysis from Contextualized Word Embedding Open

Abdellah El Mekki, Abdelkader El Mahdaouy, Ismaïl Berrada, Ahmed Khoumsi · 2021

Abdellah El Mekki, Abdelkader El Mahdaouy, Ismail Berrada, Ahmed Khoumsi. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021.

CS-UM6P at SemEval-2021 Task 1: A Deep Learning Model-based Pre-trained Transformer Encoder for Lexical Complexity Open

Nabil El Mamoun, Abdelkader El Mahdaouy, Abdellah El Mekki, Kabil Essefar, Ismaïl Berrada · 2021

Lexical Complexity Prediction (LCP) involves assigning a difficulty score to a particular word or expression, in a text intended for a target audience. In this paper, we introduce a new deep learning-based system for this challenging task.…

CS-UM6P at SemEval-2021 Task 7: Deep Multi-Task Learning Model for Detecting and Rating Humor and Offense Open

Kabil Essefar, Abdellah El Mekki, Abdelkader El Mahdaouy, Nabil El Mamoun, Ismaïl Berrada · 2021

Humor detection has become a topic of interest for several research teams, especially those involved in socio-psychological studies, with the aim to detect the humor and the temper of a targeted population (e.g. a community, a city, a coun…

Abdellah El Mekki YOU? Author Swipe