arXiv (Cornell University)
Adapting Decoder-Based Language Models for Diverse Encoder Downstream Tasks
March 2025 • Paul Suganthan, Fédor Moiseev, Limei Yan, Junru Wu, Jianmo Ni, Jay J. Han, Imed Zitouni, Enrique Alfonseca, Xuanhui Wang, Zhe Dong
Decoder-based transformers, while revolutionizing language modeling and scaling to immense sizes, have not completely overtaken encoder-heavy architectures in natural language processing. Specifically, encoder-only models remain dominant in tasks like classification, regression, and ranking. This is primarily due to the inherent structure of decoder-based models, which limits their direct applicability to these tasks. In this paper, we introduce Gemma Encoder, adapting the powerful Gemma decoder model to an encode…