Enzhe Lu
YOU?
Author Swipe
View article: Kimi-VL Technical Report
Kimi-VL Technical Report Open
We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2.8B paramet…
View article: Muon is Scalable for LLM Training
Muon is Scalable for LLM Training Open
Recently, the Muon optimizer based on matrix orthogonalization has demonstrated strong results in training small-scale language models, but the scalability to larger models has not been proven. We identify two crucial techniques for scalin…
View article: MoBA: Mixture of Block Attention for Long-Context LLMs
MoBA: Mixture of Block Attention for Long-Context LLMs Open
Scaling the effective context length is essential for advancing large language models (LLMs) toward artificial general intelligence (AGI). However, the quadratic increase in computational complexity inherent in traditional attention mechan…
View article: Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi k1.5: Scaling Reinforcement Learning with LLMs Open
Language model pretraining with next token prediction has proved effective for scaling compute but is limited to the amount of available training data. Scaling reinforcement learning (RL) unlocks a new axis for the continued improvement of…