The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants Article Swipe

PDF

Lucas Bandarkar , Davis Liang , Benjamin Müller , Mikel Artetxe , Satya Narayan Shukla , Donald Husa , Naman Goyal , Abhinandan Krishnan , Luke Zettlemoyer , Madian Khabsa ·

YOU? · · 2023 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2308.16884

We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the evaluation of text models in high-, medium-, and low-resource languages. Each question is based on a short passage from the Flores-200 dataset and has four multiple-choice answers. The questions were carefully curated to discriminate between models with different levels of general language comprehension. The English dataset on its own proves difficult enough to challenge state-of-the-art language models. Being fully parallel, this dataset enables direct comparison of model performance across all languages. We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs). We present extensive results and find that despite significant cross-lingual transfer in English-centric LLMs, much smaller MLMs pretrained on balanced multilingual data still understand far more languages. We also observe that larger vocabulary size and conscious vocabulary construction correlate with better performance on low-resource languages. Overall, Belebele opens up new avenues for evaluating and analyzing the multilingual capabilities of NLP systems.

Related Topics

Computer Science

Benchmark (Surveying)

Artificial Intelligence

Concepts

Computer science Benchmark (surveying) Comprehension Natural language processing Vocabulary Artificial intelligence Language model Reading (process) Reading comprehension Resource (disambiguation) Linguistics Programming language Geography Geodesy Computer network Philosophy

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2308.16884
PDF: https://arxiv.org/pdf/2308.16884
OA Status: green
Cited By: 1
Related Works: 10
OpenAlex ID: https://openalex.org/W4386384919

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4386384919

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2308.16884

Digital Object Identifier
Title: The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2023

Year of publication
Publication date: 2023-08-31

Full publication date if available
Authors: Lucas Bandarkar, Davis Liang, Benjamin Müller, Mikel Artetxe, Satya Narayan Shukla, Donald Husa, Naman Goyal, Abhinandan Krishnan, Luke Zettlemoyer, Madian Khabsa

List of authors in order
Landing page: https://arxiv.org/abs/2308.16884

Publisher landing page
PDF URL: https://arxiv.org/pdf/2308.16884

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2308.16884

Direct OA link when available
Concepts: Computer science, Benchmark (surveying), Comprehension, Natural language processing, Vocabulary, Artificial intelligence, Language model, Reading (process), Reading comprehension, Resource (disambiguation), Linguistics, Programming language, Geography, Geodesy, Computer network, Philosophy

Top concepts (fields/topics) attached by OpenAlex
Cited by: 1

Total citation count in OpenAlex
Citations by year (recent): 2024: 1

Per-year citation counts (last 5 years)
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4386384919
doi	https://doi.org/10.48550/arxiv.2308.16884
ids.doi	https://doi.org/10.48550/arxiv.2308.16884
ids.openalex	https://openalex.org/W4386384919
fwci	0.25544289
type	preprint
title	The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10181
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9994999766349792
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1702
topics[0].subfield.display_name	Artificial Intelligence
topics[0].display_name	Natural Language Processing Techniques
topics[1].id	https://openalex.org/T10028
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9993000030517578
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1702
topics[1].subfield.display_name	Artificial Intelligence
topics[1].display_name	Topic Modeling
topics[2].id	https://openalex.org/T13629
topics[2].field.id	https://openalex.org/fields/17
topics[2].field.display_name	Computer Science
topics[2].score	0.9977999925613403
topics[2].domain.id	https://openalex.org/domains/3
topics[2].domain.display_name	Physical Sciences
topics[2].subfield.id	https://openalex.org/subfields/1702
topics[2].subfield.display_name	Artificial Intelligence
topics[2].display_name	Text Readability and Simplification
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C41008148
concepts[0].level	0
concepts[0].score	0.8207255601882935
concepts[0].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[0].display_name	Computer science
concepts[1].id	https://openalex.org/C185798385
concepts[1].level	2
concepts[1].score	0.7365621328353882
concepts[1].wikidata	https://www.wikidata.org/wiki/Q1161707
concepts[1].display_name	Benchmark (surveying)
concepts[2].id	https://openalex.org/C511192102
concepts[2].level	2
concepts[2].score	0.6963722705841064
concepts[2].wikidata	https://www.wikidata.org/wiki/Q5156948
concepts[2].display_name	Comprehension
concepts[3].id	https://openalex.org/C204321447
concepts[3].level	1
concepts[3].score	0.6501059532165527
concepts[3].wikidata	https://www.wikidata.org/wiki/Q30642
concepts[3].display_name	Natural language processing
concepts[4].id	https://openalex.org/C2777601683
concepts[4].level	2
concepts[4].score	0.6409082412719727
concepts[4].wikidata	https://www.wikidata.org/wiki/Q6499736
concepts[4].display_name	Vocabulary
concepts[5].id	https://openalex.org/C154945302
concepts[5].level	1
concepts[5].score	0.6219724416732788
concepts[5].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[5].display_name	Artificial intelligence
concepts[6].id	https://openalex.org/C137293760
concepts[6].level	2
concepts[6].score	0.5189035534858704
concepts[6].wikidata	https://www.wikidata.org/wiki/Q3621696
concepts[6].display_name	Language model
concepts[7].id	https://openalex.org/C554936623
concepts[7].level	2
concepts[7].score	0.4581510126590729
concepts[7].wikidata	https://www.wikidata.org/wiki/Q199657
concepts[7].display_name	Reading (process)
concepts[8].id	https://openalex.org/C2778780117
concepts[8].level	3
concepts[8].score	0.45468780398368835
concepts[8].wikidata	https://www.wikidata.org/wiki/Q3269423
concepts[8].display_name	Reading comprehension
concepts[9].id	https://openalex.org/C206345919
concepts[9].level	2
concepts[9].score	0.4384937882423401
concepts[9].wikidata	https://www.wikidata.org/wiki/Q20380951
concepts[9].display_name	Resource (disambiguation)
concepts[10].id	https://openalex.org/C41895202
concepts[10].level	1
concepts[10].score	0.26503026485443115
concepts[10].wikidata	https://www.wikidata.org/wiki/Q8162
concepts[10].display_name	Linguistics
concepts[11].id	https://openalex.org/C199360897
concepts[11].level	1
concepts[11].score	0.07381138205528259
concepts[11].wikidata	https://www.wikidata.org/wiki/Q9143
concepts[11].display_name	Programming language
concepts[12].id	https://openalex.org/C205649164
concepts[12].level	0
concepts[12].score	0.06418609619140625
concepts[12].wikidata	https://www.wikidata.org/wiki/Q1071
concepts[12].display_name	Geography
concepts[13].id	https://openalex.org/C13280743
concepts[13].level	1
concepts[13].score	0.0
concepts[13].wikidata	https://www.wikidata.org/wiki/Q131089
concepts[13].display_name	Geodesy
concepts[14].id	https://openalex.org/C31258907
concepts[14].level	1
concepts[14].score	0.0
concepts[14].wikidata	https://www.wikidata.org/wiki/Q1301371
concepts[14].display_name	Computer network
concepts[15].id	https://openalex.org/C138885662
concepts[15].level	0
concepts[15].score	0.0
concepts[15].wikidata	https://www.wikidata.org/wiki/Q5891
concepts[15].display_name	Philosophy
keywords[0].id	https://openalex.org/keywords/computer-science
keywords[0].score	0.8207255601882935
keywords[0].display_name	Computer science
keywords[1].id	https://openalex.org/keywords/benchmark
keywords[1].score	0.7365621328353882
keywords[1].display_name	Benchmark (surveying)
keywords[2].id	https://openalex.org/keywords/comprehension
keywords[2].score	0.6963722705841064
keywords[2].display_name	Comprehension
keywords[3].id	https://openalex.org/keywords/natural-language-processing
keywords[3].score	0.6501059532165527
keywords[3].display_name	Natural language processing
keywords[4].id	https://openalex.org/keywords/vocabulary
keywords[4].score	0.6409082412719727
keywords[4].display_name	Vocabulary
keywords[5].id	https://openalex.org/keywords/artificial-intelligence
keywords[5].score	0.6219724416732788
keywords[5].display_name	Artificial intelligence
keywords[6].id	https://openalex.org/keywords/language-model
keywords[6].score	0.5189035534858704
keywords[6].display_name	Language model
keywords[7].id	https://openalex.org/keywords/reading
keywords[7].score	0.4581510126590729
keywords[7].display_name	Reading (process)
keywords[8].id	https://openalex.org/keywords/reading-comprehension
keywords[8].score	0.45468780398368835
keywords[8].display_name	Reading comprehension
keywords[9].id	https://openalex.org/keywords/resource
keywords[9].score	0.4384937882423401
keywords[9].display_name	Resource (disambiguation)
keywords[10].id	https://openalex.org/keywords/linguistics
keywords[10].score	0.26503026485443115
keywords[10].display_name	Linguistics
keywords[11].id	https://openalex.org/keywords/programming-language
keywords[11].score	0.07381138205528259
keywords[11].display_name	Programming language
keywords[12].id	https://openalex.org/keywords/geography
keywords[12].score	0.06418609619140625
keywords[12].display_name	Geography
language	en
locations[0].id	pmh:oai:arXiv.org:2308.16884
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2308.16884
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2308.16884
locations[1].id	doi:10.48550/arxiv.2308.16884
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article-journal
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2308.16884
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5029975638
authorships[0].author.orcid
authorships[0].author.display_name	Lucas Bandarkar
authorships[0].author_position	first
authorships[0].raw_author_name	Bandarkar, Lucas
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5112447939
authorships[1].author.orcid
authorships[1].author.display_name	Davis Liang
authorships[1].author_position	middle
authorships[1].raw_author_name	Liang, Davis
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5079873734
authorships[2].author.orcid	https://orcid.org/0000-0002-4463-2873
authorships[2].author.display_name	Benjamin Müller
authorships[2].author_position	middle
authorships[2].raw_author_name	Muller, Benjamin
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5023341622
authorships[3].author.orcid
authorships[3].author.display_name	Mikel Artetxe
authorships[3].author_position	middle
authorships[3].raw_author_name	Artetxe, Mikel
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5065117717
authorships[4].author.orcid
authorships[4].author.display_name	Satya Narayan Shukla
authorships[4].author_position	middle
authorships[4].raw_author_name	Shukla, Satya Narayan
authorships[4].is_corresponding	False
authorships[5].author.id	https://openalex.org/A5092732683
authorships[5].author.orcid
authorships[5].author.display_name	Donald Husa
authorships[5].author_position	middle
authorships[5].raw_author_name	Husa, Donald
authorships[5].is_corresponding	False
authorships[6].author.id	https://openalex.org/A5075834790
authorships[6].author.orcid	https://orcid.org/0000-0002-7565-4303
authorships[6].author.display_name	Naman Goyal
authorships[6].author_position	middle
authorships[6].raw_author_name	Goyal, Naman
authorships[6].is_corresponding	False
authorships[7].author.id	https://openalex.org/A5044240506
authorships[7].author.orcid
authorships[7].author.display_name	Abhinandan Krishnan
authorships[7].author_position	middle
authorships[7].raw_author_name	Krishnan, Abhinandan
authorships[7].is_corresponding	False
authorships[8].author.id	https://openalex.org/A5067919401
authorships[8].author.orcid	https://orcid.org/0009-0008-8296-0764
authorships[8].author.display_name	Luke Zettlemoyer
authorships[8].author_position	middle
authorships[8].raw_author_name	Zettlemoyer, Luke
authorships[8].is_corresponding	False
authorships[9].author.id	https://openalex.org/A5054253075
authorships[9].author.orcid
authorships[9].author.display_name	Madian Khabsa
authorships[9].author_position	last
authorships[9].raw_author_name	Khabsa, Madian
authorships[9].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2308.16884
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T10181
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9994999766349792
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1702
primary_topic.subfield.display_name	Artificial Intelligence
primary_topic.display_name	Natural Language Processing Techniques
related_works	https://openalex.org/W2378211422, https://openalex.org/W2745001401, https://openalex.org/W4321353415, https://openalex.org/W2130974462, https://openalex.org/W2028665553, https://openalex.org/W2086519370, https://openalex.org/W2082296339, https://openalex.org/W2161828220, https://openalex.org/W1972348076, https://openalex.org/W2083863157
cited_by_count	1
counts_by_year[0].year	2024
counts_by_year[0].cited_by_count	1
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2308.16884
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2308.16884
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2308.16884
primary_location.id	pmh:oai:arXiv.org:2308.16884
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2308.16884
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2308.16884
publication_date	2023-08-31
publication_year	2023
referenced_works_count	0
abstract_inverted_index.a	3, 44
abstract_inverted_index.We	0, 100, 119, 146
abstract_inverted_index.in	33, 130
abstract_inverted_index.is	41
abstract_inverted_index.of	19, 30, 68, 94, 108, 177
abstract_inverted_index.on	43, 75, 137, 161
abstract_inverted_index.to	61, 81, 104
abstract_inverted_index.up	167
abstract_inverted_index.122	11
abstract_inverted_index.NLP	178
abstract_inverted_index.The	56, 72
abstract_inverted_index.all	98
abstract_inverted_index.and	36, 51, 114, 123, 153, 172
abstract_inverted_index.far	143
abstract_inverted_index.for	170
abstract_inverted_index.has	52
abstract_inverted_index.its	76
abstract_inverted_index.new	168
abstract_inverted_index.own	77
abstract_inverted_index.the	16, 28, 48, 106, 174
abstract_inverted_index.use	101
abstract_inverted_index.Each	39
abstract_inverted_index.MLMs	135
abstract_inverted_index.also	147
abstract_inverted_index.data	140
abstract_inverted_index.find	124
abstract_inverted_index.four	53
abstract_inverted_index.from	47
abstract_inverted_index.more	144
abstract_inverted_index.much	133
abstract_inverted_index.size	152
abstract_inverted_index.text	31
abstract_inverted_index.that	125, 149
abstract_inverted_index.this	25, 89, 102
abstract_inverted_index.were	58
abstract_inverted_index.with	65, 158
abstract_inverted_index.(MRC)	8
abstract_inverted_index.(NLU)	23
abstract_inverted_index.Being	86
abstract_inverted_index.LLMs,	132
abstract_inverted_index.based	42
abstract_inverted_index.fully	87
abstract_inverted_index.large	115
abstract_inverted_index.model	95
abstract_inverted_index.opens	166
abstract_inverted_index.short	45
abstract_inverted_index.still	141
abstract_inverted_index.(MLMs)	113
abstract_inverted_index.across	97
abstract_inverted_index.better	159
abstract_inverted_index.direct	92
abstract_inverted_index.enough	80
abstract_inverted_index.high-,	34
abstract_inverted_index.larger	150
abstract_inverted_index.levels	67
abstract_inverted_index.masked	110
abstract_inverted_index.models	32, 64, 112, 117
abstract_inverted_index.proves	78
abstract_inverted_index.(LLMs).	118
abstract_inverted_index.English	73
abstract_inverted_index.avenues	169
abstract_inverted_index.between	63
abstract_inverted_index.curated	60
abstract_inverted_index.dataset	9, 26, 50, 74, 90, 103
abstract_inverted_index.despite	126
abstract_inverted_index.enables	27, 91
abstract_inverted_index.general	69
abstract_inverted_index.machine	5
abstract_inverted_index.models.	85
abstract_inverted_index.natural	20
abstract_inverted_index.observe	148
abstract_inverted_index.passage	46
abstract_inverted_index.present	1, 120
abstract_inverted_index.reading	6
abstract_inverted_index.results	122
abstract_inverted_index.smaller	134
abstract_inverted_index.Belebele	165
abstract_inverted_index.Overall,	164
abstract_inverted_index.answers.	55
abstract_inverted_index.balanced	138
abstract_inverted_index.coverage	18
abstract_inverted_index.evaluate	105
abstract_inverted_index.language	12, 17, 21, 70, 84, 111, 116
abstract_inverted_index.medium-,	35
abstract_inverted_index.question	40
abstract_inverted_index.spanning	10
abstract_inverted_index.systems.	179
abstract_inverted_index.transfer	129
abstract_inverted_index.Belebele,	2
abstract_inverted_index.analyzing	173
abstract_inverted_index.carefully	59
abstract_inverted_index.challenge	82
abstract_inverted_index.conscious	154
abstract_inverted_index.correlate	157
abstract_inverted_index.different	66
abstract_inverted_index.difficult	79
abstract_inverted_index.expanding	15
abstract_inverted_index.extensive	121
abstract_inverted_index.parallel,	88
abstract_inverted_index.questions	57
abstract_inverted_index.variants.	13
abstract_inverted_index.Flores-200	49
abstract_inverted_index.comparison	93
abstract_inverted_index.evaluating	171
abstract_inverted_index.evaluation	29
abstract_inverted_index.languages.	38, 99, 145, 163
abstract_inverted_index.pretrained	136
abstract_inverted_index.understand	142
abstract_inverted_index.vocabulary	151, 155
abstract_inverted_index.benchmarks,	24
abstract_inverted_index.performance	96, 160
abstract_inverted_index.significant	127
abstract_inverted_index.capabilities	107, 176
abstract_inverted_index.construction	156
abstract_inverted_index.discriminate	62
abstract_inverted_index.low-resource	37, 162
abstract_inverted_index.multilingual	109, 139, 175
abstract_inverted_index.Significantly	14
abstract_inverted_index.comprehension	7
abstract_inverted_index.cross-lingual	128
abstract_inverted_index.understanding	22
abstract_inverted_index.comprehension.	71
abstract_inverted_index.English-centric	131
abstract_inverted_index.multiple-choice	4, 54
abstract_inverted_index.state-of-the-art	83
cited_by_percentile_year.max	94
cited_by_percentile_year.min	90
countries_distinct_count	0
institutions_distinct_count	10
citation_normalized_percentile.value	0.57817245
citation_normalized_percentile.is_in_top_1_percent	False
citation_normalized_percentile.is_in_top_10_percent	False