Maraim Masoud
YOU?
Author Swipe
View article: SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models
SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models Open
Large Language Models (LLMs) reproduce and exacerbate the social biases present in their training data, and resources to quantify this issue are limited. While research has attempted to identify and mitigate such biases, most efforts have …
View article: Documenting Geographically and Contextually Diverse Language Data Sources
Documenting Geographically and Contextually Diverse Language Data Sources Open
Contemporary large-scale data collection efforts have prioritized the amount of data collected to improve large language models (LLM). This quantitative approach has resulted in concerns for the rights of data subjects represented in data …
View article: Helping Cancer Patients to Choose the Best Treatment: Towards Automated Data-Driven and Personalized Information Presentation of Cancer Treatment Options
Helping Cancer Patients to Choose the Best Treatment: Towards Automated Data-Driven and Personalized Information Presentation of Cancer Treatment Options Open
When a person is diagnosed with cancer, difficult decisions about treatments need to be made. In this chapter, we describe an interdisciplinary research project which aims to automatically generate personalized descriptions of treatment op…
View article: The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset Open
As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The BigScience workshop, a 1-year international and multidisciplinary initiative, w…
View article: Masader Plus: A New Interface for Exploring +500 Arabic NLP Datasets
Masader Plus: A New Interface for Exploring +500 Arabic NLP Datasets Open
Masader (Alyafeai et al., 2021) created a metadata structure to be used for cataloguing Arabic NLP datasets. However, developing an easy way to explore such a catalogue is a challenging task. In order to give the optimal experience for use…
Data Governance in the Age of Large-Scale Data-Driven Language Technology Open
The recent emergence and adoption of Machine Learning technology, and specifically of Large Language Models, has drawn attention to the need for systematic and transparent management of language data. This work proposes an approach to glob…
View article: Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources Open
In recent years, large-scale data collection efforts have prioritized the amount of data collected in order to improve the modeling capabilities of large language models. This prioritization, however, has resulted in concerns with respect …
You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings Open
Evaluating bias, fairness, and social impact in monolingual language models is a difficult task. This challenge is further compounded when language modeling occurs in a multilingual context. Considering the implication of evaluation biases…
View article: Masader: Metadata Sourcing for Arabic Text and Speech Data Resources
Masader: Metadata Sourcing for Arabic Text and Speech Data Resources Open
The NLP pipeline has evolved dramatically in the last few years. The first step in the pipeline is to find suitable annotated datasets to evaluate the tasks we are trying to solve. Unfortunately, most of the published datasets lack metadat…
Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review Open
Knowledge graphs have been shown to be an important data structure for many applications, including chatbot development, data integration, and semantic search. In the enterprise domain, such graphs need to be constructed based on both stru…
Towards Machine Translation for the Kurdish Language Open
Machine translation is the task of translating texts from one language to another using computers. It has been one of the major tasks in natural language processing and computational linguistics and has been motivating to facilitate human …
Aspects of Terminological and Named Entity Knowledge within Rule-Based Machine Translation Models for Under-Resourced Neural Machine Translation Scenarios Open
Rule-based machine translation is a machine translation paradigm where linguistic knowledge is encoded by an expert in the form of rules that translate text from source to target language. While this approach grants extensive control over …
Towards Machine Translation for the Kurdish Language Open
Machine translation is the task of translating texts from one language to another using computers. It has been one of the major tasks in natural language processing and computational linguistics and has been motivating to facilitate human …
Back-translation approach for code-switching machine translation: A case study Open
Recently, machine translation has demonstrated significant progress in terms of translation quality. However, most of the research has focused on translating with pure monolingual texts in the source and the target side of the parallel cor…
Leveraging rule-based machine translation knowledge for under-resourced neural machine translation models Open
Rule-based machine translation is a machine translation paradigm where linguistic knowledge is encoded by an expert in the form of rules that translate from source to target language. While this approach grants total control over the outpu…