Yonghao Zhuang
YOU?
Author Swipe
View article: Efficient Long-context Language Model Training by Core Attention Disaggregation
Efficient Long-context Language Model Training by Core Attention Disaggregation Open
We present core attention disaggregation (CAD), a technique that improves long-context large language model training by decoupling the core attention computation, softmax(QK^T)V, from the rest of the model and executing it on a separate po…
View article: Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Open
Reinforcement learning (RL) has emerged as a promising approach to improve large language model (LLM) reasoning, yet most open efforts focus narrowly on math and code, limiting our understanding of its broader applicability to general reas…
View article: A Large-Scale Foundation Model for RNA Enables Diverse Function and Structure Prediction
A Large-Scale Foundation Model for RNA Enables Diverse Function and Structure Prediction Open
Accurately predicting RNA structures and functions from nucleotide sequences, or conversely, designing sequences to meet structural and functional requirements, remains a fundamental challenge in RNA biology, largely due to limited annotat…
View article: Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow
Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow Open
View article: LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch
LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch Open
We detail the training of the LLM360 K2-65B model, scaling up our 360-degree OPEN SOURCE approach to the largest and most powerful models under project LLM360. While open-source LLMs continue to advance, the answer to "How are the largest …
View article: Accurate and General DNA Representations Emerge from Genome Foundation Models at Scale
Accurate and General DNA Representations Emerge from Genome Foundation Models at Scale Open
Language models applied to protein sequences have become a panacea, enabling therapeutics development, materials engineering, and core biology research. Despite the successes of protein language models, genome language models remain nascen…
View article: Mixture of Experts Enable Efficient and Effective Protein Understanding and Design
Mixture of Experts Enable Efficient and Effective Protein Understanding and Design Open
Proteins play a fundamental role in life. Understanding the language of proteins offers significant potential for gaining mechanistic insights into biological systems and introduces new avenues for treating diseases, enhancing agriculture,…
View article: Scaling Dense Representations for Single Cell with Transcriptome-Scale Context
Scaling Dense Representations for Single Cell with Transcriptome-Scale Context Open
Developing a unified model of cellular systems is a canonical challenge in biology. Recently, a wealth of public single-cell RNA sequencing data as well as rapid scaling of self-supervised learning methods have provided new avenues to addr…
View article: A Large-Scale Foundation Model for RNA Function and Structure Prediction
A Large-Scale Foundation Model for RNA Function and Structure Prediction Open
Originally marginalized as an intermediate in the information flow from DNA to protein, RNA has become the star of modern biology, holding the key to precision therapeutics, genetic engineering, evolutionary origins, and our understanding …
View article: Official Implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
Official Implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow" Open
This artifact is the official open-source implementation for ASPLOS 2025 paper "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow". It contains the simulator and the prototype system used in this paper. …
View article: Toward Inference-optimal Mixture-of-Expert Large Language Models
Toward Inference-optimal Mixture-of-Expert Large Language Models Open
Mixture-of-Expert (MoE) based large language models (LLMs), such as the recent Mixtral and DeepSeek-MoE, have shown great promise in scaling model size without suffering from the quadratic growth of training cost of dense transformers. Lik…
View article: LLM360: Towards Fully Transparent Open-Source LLMs
LLM360: Towards Fully Transparent Open-Source LLMs Open
The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final mod…
View article: LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset Open
Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications. In this paper, we introduce LMSYS-Chat-1M, a large-scale dataset containi…
View article: Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Open
Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to…
View article: On Optimizing the Communication of Model Parallelism
On Optimizing the Communication of Model Parallelism Open
We study a novel and important communication pattern in large-scale model-parallel deep learning (DL), which we call cross-mesh resharding. This pattern emerges when the two paradigms of model parallelism - intra-operator and inter-operato…
View article: Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning Open
Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel training systems either require users to manually create a…