Explanipedia

Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content Open

Rian Touchent, Nathan Godey, Éric Villemonte de la Clergerie · 2025

We introduce Biomed-Enriched, a biomedical text dataset constructed from PubMed via a two-stage annotation process. In the first stage, a large language model annotates 400K paragraphs from PubMed scientific articles, assigning scores for …

Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression Open

Nathan Godey, Alessio Devoto, Yu Zhao, Simone Scardapane, Pasquale Minervini , et al. · 2025

Autoregressive language models rely on a Key-Value (KV) Cache, which avoids re-computing past hidden states during generation, making it faster. As model sizes and context lengths grow, the KV Cache becomes a significant memory bottleneck,…

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck Open

Nathan Godey, Éric Villemonte de la Clergerie, Benoît Sagot · 2024

Computer science Mathematics

Recent advances in language modeling consist in pretraining highly parameterized neural networks on extremely large web-mined text corpora. Training and inference with such models can be costly in practice, which incentivizes the use of sm…

On the Scaling Laws of Geographical Representation in Language Models Open

Nathan Godey, Eric Villemonte de La Clergerie, Benoît Sagot · 2024

Computer science Mathematics Political science

Language models have long been shown to embed geographical information in their hidden representations. This line of work has recently been revisited by extending this result to Large Language Models (LLMs). In this paper, we propose to fi…

Anisotropy Is Inherent to Self-Attention in Transformers Open

Nathan Godey, Eric Villemonte de La Clergerie, Benoît Sagot · 2024

Computer science Physics

The representation degeneration problem is a phenomenon that is widely observed among self-supervised learning methods based on Transformers. In NLP, it takes the form of anisotropy, a singular property of hidden representations which make…

Headless Language Models: Learning without Predicting with Contrastive Weight Tying Open

Nathan Godey, Éric Villemonte de la Clergerie, Benoît Sagot · 2023

Computer science

Self-supervised pre-training of language models usually consists in predicting probability distributions over extensive token vocabularies. In this study, we propose an innovative method that shifts away from probability prediction and ins…

Is Anisotropy Inherent to Transformers? Open

Nathan Godey, Éric Villemonte de la Clergerie, Benoît Sagot · 2023

Computer science Physics

The representation degeneration problem is a phenomenon that is widely observed among self-supervised learning methods based on Transformers. In NLP, it takes the form of anisotropy, a singular property of hidden representations which make…

MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling Open

Nathan Godey, Roman Castagné, Éric Villemonte de la Clergerie, Benoît Sagot · 2022

Computer science Chemistry Geography

Static subword tokenization algorithms have been an essential component of recent works on language modeling. However, their static nature results in important flaws that degrade the models' downstream performance and robustness. In this w…

MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling Open

Nathan Godey, Roman Castagné, Eric Villemonte de La Clergerie, Benoît Sagot · 2022

Computer science

International audience

MANTa: Efficient Gradient-Based Tokenization for End-to-End Robust Language Modeling Open

Nathan Godey, Roman Castagné, Éric Villemonte de la Clergerie, Benoît Sagot · 2022

Computer science Chemistry

Static subword tokenization algorithms have been an essential component of recent works on language modeling. However, their static nature results in important flaws that degrade the models' downstream performance and robustness. In this w…

Nathan Godey YOU? Author Swipe