Klaudia Thellmann
YOU?
Author Swipe
View article: Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs Open
We present two multilingual LLMs, Teuken 7B-base and Teuken 7B-instruct, designed to embrace Europe’s linguistic diversity by supporting all 24 official languages of the European Union. Trained on a dataset comprising around 60% non-Englis…
View article: Towards Multilingual LLM Evaluation for European Languages
Towards Multilingual LLM Evaluation for European Languages Open
The rise of Large Language Models (LLMs) has revolutionized natural language processing across numerous languages and tasks. However, evaluating LLM performance in a consistent and meaningful way across multiple European languages remains …
View article: Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs Open
We present two multilingual LLMs, Teuken 7B-base and Teuken 7B-instruct, designed to embrace Europe's linguistic diversity by supporting all 24 official languages of the European Union. Trained on a dataset comprising around 60% non-Englis…
View article: Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?
Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions? Open
The adaption of multilingual pre-trained LLMs into eloquent and helpful assistants is essential to facilitate their use across different language regions. In that spirit, we are the first to conduct an extensive study of the performance of…
View article: Tokenizer Choice For LLM Training: Negligible or Crucial?
Tokenizer Choice For LLM Training: Negligible or Crucial? Open
The recent success of Large Language Models (LLMs) has been predominantly driven by curating the training dataset composition, scaling of model architectures and dataset sizes and advancements in pretraining objectives, leaving tokenizer i…