CoDA: Coding LM via Diffusion Adaptation Article Swipe
Haolin Chen
,
Shiyu Wang
,
Can Qin
,
Bo Pang
,
Zuxin Liu
,
Jielin Qiu
,
Jianguo Zhang
,
Zhou Yingbo
,
Zeyuan Chen
,
Ran Xu
,
Shelby Heinecke
,
Silvio Savarese
,
Caiming Xiong
,
Huan Wang
,
Weiran Yao
·
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2510.03270
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2510.03270
Diffusion language models promise bidirectional context and infilling capabilities that autoregressive coders lack, yet practical systems remain heavyweight. We introduce CoDA, a 1.7B-parameter diffusion coder trained on TPU with a fully open-source training pipeline. CoDA pairs large-scale diffusion pre-training with code-centric mid-training and instruction tuning, enabling confidence-guided sampling that keeps inference latency competitive. On Humaneval, MBPP, and EvalPlus, CoDA-1.7B-Instruct matches or surpasses diffusion models up to 7B parameters. Our release includes model checkpoints, evaluation harnesses, and TPU training pipelines to accelerate research on lightweight diffusion-based coding assistants.
Related Topics To Compare & Contrast
Concepts
No concepts available.
Metadata
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2510.03270
- https://arxiv.org/pdf/2510.03270
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4414968642
All OpenAlex metadata
Raw OpenAlex JSON
No additional metadata available.