EuroLLM-9B: Technical Report Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2506.04079
This report presents EuroLLM-9B, a large language model trained from scratch to support the needs of European citizens by covering all 24 official European Union languages and 11 additional languages. EuroLLM addresses the issue of European languages being underrepresented and underserved in existing open large language models. We provide a comprehensive overview of EuroLLM-9B's development, including tokenizer design, architectural specifications, data filtering, and training procedures. We describe the pre-training data collection and filtering pipeline, including the creation of EuroFilter, an AI-based multilingual filter, as well as the design of EuroBlocks-Synthetic, a novel synthetic dataset for post-training that enhances language coverage for European languages. Evaluation results demonstrate EuroLLM-9B's competitive performance on multilingual benchmarks and machine translation tasks, establishing it as the leading open European-made LLM of its size. To support open research and adoption, we release all major components of this work, including the base and instruction-tuned models, the EuroFilter classifier, and the synthetic post-training dataset.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2506.04079
- https://arxiv.org/pdf/2506.04079
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416075033
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416075033Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2506.04079Digital Object Identifier
- Title
-
EuroLLM-9B: Technical ReportWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-06-04Full publication date if available
- Authors
-
Pedro Henrique Martins, João Alves, Patrick Fernandes, Ricardo Rei, M. Amin Farajian, Mateusz Klimaszewski, Duarte M. Alves, José P. Pombal, Nicolas Boizard, Manuel Faysse, Pierre Colombo, François Yvon, Barry Haddow, José G. C. de Souza, Alexandra Birch, André F. T. MartinsList of authors in order
- Landing page
-
https://arxiv.org/abs/2506.04079Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2506.04079Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2506.04079Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416075033 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2506.04079 |
| ids.doi | https://doi.org/10.48550/arxiv.2506.04079 |
| ids.openalex | https://openalex.org/W4416075033 |
| fwci | 0.0 |
| type | preprint |
| title | EuroLLM-9B: Technical Report |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2506.04079 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://arxiv.org/pdf/2506.04079 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2506.04079 |
| locations[1].id | doi:10.48550/arxiv.2506.04079 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2506.04079 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101708734 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-8038-9073 |
| authorships[0].author.display_name | Pedro Henrique Martins |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Martins, Pedro Henrique |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5103829551 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | João Alves |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Alves, João |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5061762931 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Patrick Fernandes |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Fernandes, Patrick |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5039347839 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-8265-1939 |
| authorships[3].author.display_name | Ricardo Rei |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Rei, Ricardo |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5019543373 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-6384-5332 |
| authorships[4].author.display_name | M. Amin Farajian |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Farajian, Amin |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5066205759 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Mateusz Klimaszewski |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Klimaszewski, Mateusz |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5007183487 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Duarte M. Alves |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Alves, Duarte M. |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5070177445 |
| authorships[7].author.orcid | https://orcid.org/0000-0003-4900-630X |
| authorships[7].author.display_name | José P. Pombal |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Pombal, José |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5064115506 |
| authorships[8].author.orcid | |
| authorships[8].author.display_name | Nicolas Boizard |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Boizard, Nicolas |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5093123135 |
| authorships[9].author.orcid | |
| authorships[9].author.display_name | Manuel Faysse |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Faysse, Manuel |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5103969906 |
| authorships[10].author.orcid | |
| authorships[10].author.display_name | Pierre Colombo |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Colombo, Pierre |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5030615769 |
| authorships[11].author.orcid | https://orcid.org/0000-0002-7972-7442 |
| authorships[11].author.display_name | François Yvon |
| authorships[11].author_position | middle |
| authorships[11].raw_author_name | Yvon, François |
| authorships[11].is_corresponding | False |
| authorships[12].author.id | https://openalex.org/A5110781707 |
| authorships[12].author.orcid | |
| authorships[12].author.display_name | Barry Haddow |
| authorships[12].author_position | middle |
| authorships[12].raw_author_name | Haddow, Barry |
| authorships[12].is_corresponding | False |
| authorships[13].author.id | https://openalex.org/A5103075338 |
| authorships[13].author.orcid | https://orcid.org/0000-0001-6344-7633 |
| authorships[13].author.display_name | José G. C. de Souza |
| authorships[13].author_position | middle |
| authorships[13].raw_author_name | de Souza, José G. C. |
| authorships[13].is_corresponding | False |
| authorships[14].author.id | https://openalex.org/A5038456766 |
| authorships[14].author.orcid | https://orcid.org/0000-0002-9022-3405 |
| authorships[14].author.display_name | Alexandra Birch |
| authorships[14].author_position | last |
| authorships[14].raw_author_name | Birch, Alexandra |
| authorships[14].is_corresponding | False |
| authorships[15].author.id | https://openalex.org/A5051693368 |
| authorships[15].author.orcid | https://orcid.org/0000-0001-8282-625X |
| authorships[15].author.display_name | André F. T. Martins |
| authorships[15].author_position | middle |
| authorships[15].raw_author_name | Martins, André F. T. |
| authorships[15].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2506.04079 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | EuroLLM-9B: Technical Report |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T09:52:06.185452 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2506.04079 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2506.04079 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2506.04079 |
| primary_location.id | pmh:oai:arXiv.org:2506.04079 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://arxiv.org/pdf/2506.04079 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2506.04079 |
| publication_date | 2025-06-04 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 4, 49, 90 |
| abstract_inverted_index.11 | 27 |
| abstract_inverted_index.24 | 21 |
| abstract_inverted_index.To | 127 |
| abstract_inverted_index.We | 47, 65 |
| abstract_inverted_index.an | 79 |
| abstract_inverted_index.as | 83, 85, 118 |
| abstract_inverted_index.by | 18 |
| abstract_inverted_index.in | 41 |
| abstract_inverted_index.it | 117 |
| abstract_inverted_index.of | 15, 34, 52, 77, 88, 124, 138 |
| abstract_inverted_index.on | 109 |
| abstract_inverted_index.to | 11 |
| abstract_inverted_index.we | 133 |
| abstract_inverted_index.LLM | 123 |
| abstract_inverted_index.all | 20, 135 |
| abstract_inverted_index.and | 26, 39, 62, 71, 112, 131, 144, 150 |
| abstract_inverted_index.for | 94, 100 |
| abstract_inverted_index.its | 125 |
| abstract_inverted_index.the | 13, 32, 67, 75, 86, 119, 142, 147, 151 |
| abstract_inverted_index.This | 0 |
| abstract_inverted_index.base | 143 |
| abstract_inverted_index.data | 60, 69 |
| abstract_inverted_index.from | 9 |
| abstract_inverted_index.open | 43, 121, 129 |
| abstract_inverted_index.that | 96 |
| abstract_inverted_index.this | 139 |
| abstract_inverted_index.well | 84 |
| abstract_inverted_index.Union | 24 |
| abstract_inverted_index.being | 37 |
| abstract_inverted_index.issue | 33 |
| abstract_inverted_index.large | 5, 44 |
| abstract_inverted_index.major | 136 |
| abstract_inverted_index.model | 7 |
| abstract_inverted_index.needs | 14 |
| abstract_inverted_index.novel | 91 |
| abstract_inverted_index.size. | 126 |
| abstract_inverted_index.work, | 140 |
| abstract_inverted_index.design | 87 |
| abstract_inverted_index.report | 1 |
| abstract_inverted_index.tasks, | 115 |
| abstract_inverted_index.EuroLLM | 30 |
| abstract_inverted_index.dataset | 93 |
| abstract_inverted_index.design, | 57 |
| abstract_inverted_index.filter, | 82 |
| abstract_inverted_index.leading | 120 |
| abstract_inverted_index.machine | 113 |
| abstract_inverted_index.models, | 146 |
| abstract_inverted_index.models. | 46 |
| abstract_inverted_index.provide | 48 |
| abstract_inverted_index.release | 134 |
| abstract_inverted_index.results | 104 |
| abstract_inverted_index.scratch | 10 |
| abstract_inverted_index.support | 12, 128 |
| abstract_inverted_index.trained | 8 |
| abstract_inverted_index.AI-based | 80 |
| abstract_inverted_index.European | 16, 23, 35, 101 |
| abstract_inverted_index.citizens | 17 |
| abstract_inverted_index.coverage | 99 |
| abstract_inverted_index.covering | 19 |
| abstract_inverted_index.creation | 76 |
| abstract_inverted_index.dataset. | 154 |
| abstract_inverted_index.describe | 66 |
| abstract_inverted_index.enhances | 97 |
| abstract_inverted_index.existing | 42 |
| abstract_inverted_index.language | 6, 45, 98 |
| abstract_inverted_index.official | 22 |
| abstract_inverted_index.overview | 51 |
| abstract_inverted_index.presents | 2 |
| abstract_inverted_index.research | 130 |
| abstract_inverted_index.training | 63 |
| abstract_inverted_index.addresses | 31 |
| abstract_inverted_index.adoption, | 132 |
| abstract_inverted_index.filtering | 72 |
| abstract_inverted_index.including | 55, 74, 141 |
| abstract_inverted_index.languages | 25, 36 |
| abstract_inverted_index.pipeline, | 73 |
| abstract_inverted_index.synthetic | 92, 152 |
| abstract_inverted_index.tokenizer | 56 |
| abstract_inverted_index.EuroFilter | 148 |
| abstract_inverted_index.Evaluation | 103 |
| abstract_inverted_index.additional | 28 |
| abstract_inverted_index.benchmarks | 111 |
| abstract_inverted_index.collection | 70 |
| abstract_inverted_index.components | 137 |
| abstract_inverted_index.filtering, | 61 |
| abstract_inverted_index.languages. | 29, 102 |
| abstract_inverted_index.EuroFilter, | 78 |
| abstract_inverted_index.EuroLLM-9B, | 3 |
| abstract_inverted_index.classifier, | 149 |
| abstract_inverted_index.competitive | 107 |
| abstract_inverted_index.demonstrate | 105 |
| abstract_inverted_index.performance | 108 |
| abstract_inverted_index.procedures. | 64 |
| abstract_inverted_index.translation | 114 |
| abstract_inverted_index.underserved | 40 |
| abstract_inverted_index.EuroLLM-9B's | 53, 106 |
| abstract_inverted_index.development, | 54 |
| abstract_inverted_index.establishing | 116 |
| abstract_inverted_index.multilingual | 81, 110 |
| abstract_inverted_index.pre-training | 68 |
| abstract_inverted_index.European-made | 122 |
| abstract_inverted_index.architectural | 58 |
| abstract_inverted_index.comprehensive | 50 |
| abstract_inverted_index.post-training | 95, 153 |
| abstract_inverted_index.specifications, | 59 |
| abstract_inverted_index.underrepresented | 38 |
| abstract_inverted_index.instruction-tuned | 145 |
| abstract_inverted_index.EuroBlocks-Synthetic, | 89 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 16 |
| citation_normalized_percentile |