Ismaïl Berrada
YOU?
Author Swipe
View article: Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset Open
Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explic…
View article: GemMaroc: Unlocking Darija Proficiency in LLMs with Minimal Data
GemMaroc: Unlocking Darija Proficiency in LLMs with Minimal Data Open
Open-source large language models (LLMs) still marginalise Moroccan Arabic (Darija), forcing practitioners either to bolt on heavyweight Arabic adapters or to sacrifice the very reasoning skills that make LLMs useful. We show that a rigoro…
View article: Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs
Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs Open
As large language models (LLMs) become increasingly integrated into daily life, ensuring their cultural sensitivity and inclusivity is paramount. We introduce our dataset, a year-long community-driven project covering all 22 Arab countries…
View article: KoopAGRU: A Koopman-based Anomaly Detection in Time-Series using Gated Recurrent Units
KoopAGRU: A Koopman-based Anomaly Detection in Time-Series using Gated Recurrent Units Open
Anomaly detection in real-world time-series data is a challenging task due to the complex and nonlinear temporal dynamics involved. This paper introduces KoopAGRU, a new deep learning model designed to tackle this problem by combining Fast…
View article: Dialect2SQL: A Novel Text-to-SQL Dataset for Arabic Dialects with a Focus on Moroccan Darija
Dialect2SQL: A Novel Text-to-SQL Dataset for Arabic Dialects with a Focus on Moroccan Darija Open
The task of converting natural language questions (NLQs) into executable SQL queries, known as text-to-SQL, has gained significant interest in recent years, as it enables non-technical users to interact with relational databases. Many benc…
View article: Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs
Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs Open
View article: Phoenix at Palmx: Exploring Data Augmentation for Arabic Cultural Question Answering
Phoenix at Palmx: Exploring Data Augmentation for Arabic Cultural Question Answering Open
View article: Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset Open
View article: DarijaBanking: A new resource for overcoming language barriers in banking intent detection for Moroccan Arabic speakers
DarijaBanking: A new resource for overcoming language barriers in banking intent detection for Moroccan Arabic speakers Open
Navigating the complexities of language diversity is a central challenge in developing robust natural language processing systems, especially in specialized domains like banking. The Moroccan Dialect of Arabic (Darija) serves as a common l…
View article: Casablanca: Data and Models for Multidialectal Arabic Speech Recognition
Casablanca: Data and Models for Multidialectal Arabic Speech Recognition Open
In spite of the recent progress in speech processing, the majority of world languages and dialects remain uncovered. This situation only furthers an already wide technological divide, thereby hindering technological and socioeconomic inclu…
View article: DomURLs_BERT: Pre-trained BERT-based Model for Malicious Domains and URLs Detection and Classification
DomURLs_BERT: Pre-trained BERT-based Model for Malicious Domains and URLs Detection and Classification Open
Detecting and classifying suspicious or malicious domain names and URLs is fundamental task in cybersecurity. To leverage such indicators of compromise, cybersecurity vendors and practitioners often maintain and update blacklists of known …
View article: AraFinNLP 2024: The First Arabic Financial NLP Shared Task
AraFinNLP 2024: The First Arabic Financial NLP Shared Task Open
The expanding financial markets of the Arab world require sophisticated Arabic NLP tools. To address this need within the banking domain, the Arabic Financial NLP (AraFinNLP) shared task proposes two subtasks: (i) Multi-dialect Intent Dete…
View article: DarijaBanking: A New Resource for Overcoming Language Barriers in Banking Intent Detection for Moroccan Arabic Speakers
DarijaBanking: A New Resource for Overcoming Language Barriers in Banking Intent Detection for Moroccan Arabic Speakers Open
Navigating the complexities of language diversity is a central challenge in developing robust natural language processing systems, especially in specialized domains like banking. The Moroccan Dialect (Darija) serves as the common language …
View article: Arabic Text Diacritization In The Age Of Transfer Learning: Token Classification Is All You Need
Arabic Text Diacritization In The Age Of Transfer Learning: Token Classification Is All You Need Open
Automatic diacritization of Arabic text involves adding diacritical marks (diacritics) to the text. This task poses a significant challenge with noteworthy implications for computational processing and comprehension. In this paper, we intr…
View article: Lightweight Federated Learning for Efficient Network Intrusion Detection
Lightweight Federated Learning for Efficient Network Intrusion Detection Open
Network Intrusion Detection Systems (NIDS) play a crucial role in ensuring cybersecurity across various digital infrastructures. However, traditional NIDS face significant challenges, including high computational and storage costs, as well…
View article: Unlocking the Power of Transfer Learning with Ad-Dabit-Al-Lughawi: A Token Classification Approach for Enhanced Arabic Text Diacritization
Unlocking the Power of Transfer Learning with Ad-Dabit-Al-Lughawi: A Token Classification Approach for Enhanced Arabic Text Diacritization Open
View article: CT-xCOV: a CT-scan based Explainable Framework for COVid-19 diagnosis
CT-xCOV: a CT-scan based Explainable Framework for COVid-19 diagnosis Open
In this work, CT-xCOV, an explainable framework for COVID-19 diagnosis using Deep Learning (DL) on CT-scans is developed. CT-xCOV adopts an end-to-end approach from lung segmentation to COVID-19 detection and explanations of the detection …
View article: ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting
ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting Open
Bilingual Lexicon Induction (BLI), where words are translated between two languages, is an important NLP task. While noticeable progress on BLI in rich resource languages using static word embeddings has been achieved. The word translation…
View article: Message from the FiCloud-2023 Chairs
Message from the FiCloud-2023 Chairs Open
Welcome to the 10 th International Conference on Future Internet of Things and Cloud (FiCloud-2023), which is held online and on-site in Marrakech, Morocco, during 14-16 August 2023.Marrakech is one of the legendary cities surrounded by th…
View article: DAQAS: Deep Arabic Question Answering System based on duplicate question detection and machine reading comprehension
DAQAS: Deep Arabic Question Answering System based on duplicate question detection and machine reading comprehension Open
As of late, various deep learning techniques and methods have shown their superiority to feature-based and shallow learning techniques in the field of open-domain question–answering systems (OpenQAS). However, only a few works adopted thes…
View article: UL & UM6P at SemEval-2023 Task 10: Semi-Supervised Multi-task Learning for Explainable Detection of Online Sexism
UL & UM6P at SemEval-2023 Task 10: Semi-Supervised Multi-task Learning for Explainable Detection of Online Sexism Open
This paper introduces our participating system to the Explainable Detection of Online Sexism (EDOS) SemEval-2023 - Task 10: Explainable Detection of Online Sexism. The EDOS shared task covers three hierarchical sub-tasks for sexism detecti…
View article: UM6P at SemEval-2023 Task 3: News genre classification based on transformers, graph convolution networks and number of sentences
UM6P at SemEval-2023 Task 3: News genre classification based on transformers, graph convolution networks and number of sentences Open
This paper presents our proposed method for english documents genre classification in the context of SemEval 2023 task 3, subtask 1. Our method use ensemble technique to combine four distinct models predictions: Longformer, RoBERTa, GCN, a…
View article: ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting
ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting Open
Abdellah El Mekki, Muhammad Abdul-Mageed, ElMoatez Billah Nagoudi, Ismail Berrada, Ahmed Khoumsi. Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of t…
View article: UM6P & UL at WojoodNER shared task: Improving Multi-Task Learning for Flat and Nested Arabic Named Entity Recognition
UM6P & UL at WojoodNER shared task: Improving Multi-Task Learning for Flat and Nested Arabic Named Entity Recognition Open
In this paper, we present our submitted system for the WojoodNER Shared Task, addressing both flat and nested Arabic Named Entity Recognition (NER). Our system is based on a BERT-based multi-task learning model that leverages the existing …
View article: UM6P at SemEval-2023 Task 12: Out-Of-Distribution Generalization Method for African Languages Sentiment Analysis
UM6P at SemEval-2023 Task 12: Out-Of-Distribution Generalization Method for African Languages Sentiment Analysis Open
This paper presents our submitted system to AfriSenti SemEval-2023 Task 12: Sentiment Analysis for African Languages. The AfriSenti consists of three different tasks, covering monolingual, multilingual, and zero-shot sentiment analysis sce…
View article: UL & UM6P at ArAIEval Shared Task: Transformer-based model for Persuasion Techniques and Disinformation detection in Arabic
UL & UM6P at ArAIEval Shared Task: Transformer-based model for Persuasion Techniques and Disinformation detection in Arabic Open
In this paper, we introduce our participating system to the ArAIEval Shared Task, addressing both the detection of persuasion techniques and disinformation tasks. Our proposed system employs a pre-trained transformer-based language model f…
View article: Driver profiling: The pathway to deeper personalization
Driver profiling: The pathway to deeper personalization Open
View article: Prediction and Privacy Scheme for Traffic Flow Estimation on the Highway Road Network
Prediction and Privacy Scheme for Traffic Flow Estimation on the Highway Road Network Open
Accurate and timely traffic information is a vital element in intelligent transportation systems and urban management, which is vitally important for road users and government agencies. However, existing traffic prediction approaches are p…
View article: CS-UM6P at SemEval-2022 Task 6: Transformer-based Models for Intended Sarcasm Detection in English and Arabic
CS-UM6P at SemEval-2022 Task 6: Transformer-based Models for Intended Sarcasm Detection in English and Arabic Open
Sarcasm is a form of figurative language where the intended meaning of a sentence differs from its literal meaning. This poses a serious challenge to several Natural Language Processing (NLP) applications such as Sentiment Analysis, Opinio…
View article: Deep Multi-Task Models for Misogyny Identification and Categorization on Arabic Social Media
Deep Multi-Task Models for Misogyny Identification and Categorization on Arabic Social Media Open
The prevalence of toxic content on social media platforms, such as hate speech, offensive language, and misogyny, presents serious challenges to our interconnected society. These challenging issues have attracted widespread attention in Na…