Explanipedia

Efficient Long-context Language Model Training by Core Attention Disaggregation Open

Yonghao Zhuang, Jiajia Chen, Bo Pang, Yi Gu, Yibo Zhu , et al. · 2025

We present core attention disaggregation (CAD), a technique that improves long-context large language model training by decoupling the core attention computation, softmax(QK^T)V, from the rest of the model and executing it on a separate po…

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Open

Zhoujun Cheng, Shibo Hao, Tianyang Liu, Fan Zhou, Y. H. Xie , et al. · 2025

Reinforcement learning (RL) has emerged as a promising approach to improve large language model (LLM) reasoning, yet most open efforts focus narrowly on math and code, limiting our understanding of its broader applicability to general reas…

A Large-Scale Foundation Model for RNA Enables Diverse Function and Structure Prediction Open

Eric P. Xing, S. Zou, Tianhua Tao, Parvez Mahbub, Caleb N. Ellington , et al. · 2025

Accurately predicting RNA structures and functions from nucleotide sequences, or conversely, designing sequences to meet structural and functional requirements, remains a fundamental challenge in RNA biology, largely due to limited annotat…

Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow Open

Yixuan Mei, Yonghao Zhuang, Xupeng Miao, Juncheng Yang, Zhihao Jia , et al. · 2025

LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch Open

Zhengzhong Liu, Bowen Tan, Hongyi Wang, Willie Neiswanger, Tianhua Tao , et al. · 2025

We detail the training of the LLM360 K2-65B model, scaling up our 360-degree OPEN SOURCE approach to the largest and most powerful models under project LLM360. While open-source LLMs continue to advance, the answer to "How are the largest …

Accurate and General DNA Representations Emerge from Genome Foundation Models at Scale Open

Caleb N. Ellington, Nian X. Sun, Nicholas Ho, Tianhua Tao, Sazan Mahbub , et al. · 2024

Language models applied to protein sequences have become a panacea, enabling therapeutics development, materials engineering, and core biology research. Despite the successes of protein language models, genome language models remain nascen…

Mixture of Experts Enable Efficient and Effective Protein Understanding and Design Open

Ning Sun, S. Zou, Tianhua Tao, Parvez Mahbub, Dian Li , et al. · 2024

Proteins play a fundamental role in life. Understanding the language of proteins offers significant potential for gaining mechanistic insights into biological systems and introduces new avenues for treating diseases, enhancing agriculture,…

Scaling Dense Representations for Single Cell with Transcriptome-Scale Context Open

Nicholas Ho, Caleb N. Ellington, Jing Hou, Sohan Addagudi, Shentong Mo , et al. · 2024

Developing a unified model of cellular systems is a canonical challenge in biology. Recently, a wealth of public single-cell RNA sequencing data as well as rapid scaling of self-supervised learning methods have provided new avenues to addr…

A Large-Scale Foundation Model for RNA Function and Structure Prediction Open

S. Zou, Tianhua Tao, Parvez Mahbub, Caleb N. Ellington, Robin Algayres , et al. · 2024

Originally marginalized as an intermediate in the information flow from DNA to protein, RNA has become the star of modern biology, holding the key to precision therapeutics, genetic engineering, evolutionary origins, and our understanding …

Official Implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow" Open

Yixuan Mei, Yonghao Zhuang, Xupeng Miao, Juncheng Yang, Zhihao Jia , et al. · 2024

This artifact is the official open-source implementation for ASPLOS 2025 paper "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow". It contains the simulator and the prototype system used in this paper. …

Toward Inference-optimal Mixture-of-Expert Large Language Models Open

Longfei Yun, Yonghao Zhuang, Yao Fu, Eric P. Xing, Hao Zhang · 2024

Mixture-of-Expert (MoE) based large language models (LLMs), such as the recent Mixtral and DeepSeek-MoE, have shown great promise in scaling model size without suffering from the quadratic growth of training cost of dense transformers. Lik…

LLM360: Towards Fully Transparent Open-Source LLMs Open

Zhengzhong Liu, Aurick Qiao, Willie Neiswanger, Hongyi Wang, Bowen Tan , et al. · 2023

The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final mod…

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset Open

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang , et al. · 2023

Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications. In this paper, we introduce LMSYS-Chat-1M, a large-scale dataset containi…

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Open

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu , et al. · 2023

Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to…

On Optimizing the Communication of Model Parallelism Open

Yonghao Zhuang, Hexu Zhao, Lianmin Zheng, Zhuohan Li, Eric P. Xing , et al. · 2022

We study a novel and important communication pattern in large-scale model-parallel deep learning (DL), which we call cross-mesh resharding. This pattern emerges when the two paradigms of model parallelism - intra-operator and inter-operato…

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning Open

Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen , et al. · 2022

Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel training systems either require users to manually create a…

Yonghao Zhuang YOU? Author Swipe