Enzhe Lu YOU? Author Swipe

Last 10y

Open Invitation to Help Curate This Field & Enhance Impact .ORG

Kimi-VL Technical Report Open

Kimi Team, Angang Du, Benfeng Yin, Bowei Xing, Biao Qu , et al. · 2025

We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2.8B paramet…

Muon is Scalable for LLM Training Open

Jingyuan Liu, Jianlin Su, Xingcheng Yao, Zhejun Jiang, Guokun Lai , et al. · 2025

Recently, the Muon optimizer based on matrix orthogonalization has demonstrated strong results in training small-scale language models, but the scalability to larger models has not been proven. We identify two crucial techniques for scalin…

MoBA: Mixture of Block Attention for Long-Context LLMs Open

Enzhe Lu, Zhishen Jiang, Jingyuan Liu, Yulun Du, Tao Jiang , et al. · 2025

Geography Mathematics

Scaling the effective context length is essential for advancing large language models (LLMs) toward artificial general intelligence (AGI). However, the quadratic increase in computational complexity inherent in traditional attention mechan…

Kimi k1.5: Scaling Reinforcement Learning with LLMs Open

Kimi Team, Angang Du, Bofei Gao, Bowei Xing, C. H. Jiang , et al. · 2025

Computer science Psychology Mathematics

Language model pretraining with next token prediction has proved effective for scaling compute but is limited to the amount of available training data. Scaling reinforcement learning (RL) unlocks a new axis for the continued improvement of…

Creating related items for first view…