Explanipedia

Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs Open

Mehdi Ali, Michael Fromm, Klaudia Thellmann, Jan Ebert, Alexander Arno Weber , et al. · 2025

We present two multilingual LLMs, Teuken 7B-base and Teuken 7B-instruct, designed to embrace Europe’s linguistic diversity by supporting all 24 official languages of the European Union. Trained on a dataset comprising around 60% non-Englis…

Towards Multilingual LLM Evaluation for European Languages Open

Klaudia Thellmann, Bernhard Stadler, Michael Fromm, Jasper Schulze Buschhoff, Alex Jude , et al. · 2024

The rise of Large Language Models (LLMs) has revolutionized natural language processing across numerous languages and tasks. However, evaluating LLM performance in a consistent and meaningful way across multiple European languages remains …

Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs Open

Mehdi Ali, Michael Fromm, Klaudia Thellmann, Jan Ebert, Alexander Arno Weber , et al. · 2024

We present two multilingual LLMs, Teuken 7B-base and Teuken 7B-instruct, designed to embrace Europe's linguistic diversity by supporting all 24 official languages of the European Union. Trained on a dataset comprising around 60% non-Englis…

Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions? Open

Alexander Arno Weber, Klaudia Thellmann, Jan Ebert, Nicolas Flores-Herr, Jens Lehmann , et al. · 2024

The adaption of multilingual pre-trained LLMs into eloquent and helpful assistants is essential to facilitate their use across different language regions. In that spirit, we are the first to conduct an extensive study of the performance of…

Tokenizer Choice For LLM Training: Negligible or Crucial? Open

Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering , et al. · 2023

The recent success of Large Language Models (LLMs) has been predominantly driven by curating the training dataset composition, scaling of model architectures and dataset sizes and advancements in pretraining objectives, leaving tokenizer i…

Klaudia Thellmann YOU? Author Swipe