Training Optimal Large Diffusion Language Models Article Swipe
Related Concepts
No concepts available.
Jinjie Ni
,
Qian Liu
,
Chao Du
,
Longxu Dou
,
Hang Yan
,
Zili Wang
,
Tianyu Pang
,
Michael Shieh
·
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2510.03280
· OA: W4414968636
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2510.03280
· OA: W4414968636
We introduce Quokka, the first systematic scaling law for diffusion language models (DLMs), encompassing both compute-constrained and data-constrained regimes, and studying the key modeling and optimization designs. Quokka is a good friend of Chinchilla and provides wider scopes. We hope the results would bring short-term practical guidance in DLMs training and long-term inspirations for the whole AI community.
Related Topics
Finding more related topics…