Quanquan Gu
YOU?
Author Swipe
View article: Higher-order Linear Attention
Higher-order Linear Attention Open
The quadratic cost of scaled dot-product attention is a central obstacle to scaling autoregressive language models to long contexts. Linear-time attention and State Space Models (SSMs) provide scalable alternatives but are typically restri…
View article: Robust Layerwise Scaling Rules by Proper Weight Decay Tuning
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning Open
Empirical scaling laws prescribe how to allocate parameters, data, and compute, while maximal-update parameterization ($μ$P) enables learning-rate transfer across widths by equalizing early-time update magnitudes. However, in modern scale-…
View article: Light intensity exercise and sleep: Indications of enhanced oxygen diffusion, perfusion, and metabolism
Light intensity exercise and sleep: Indications of enhanced oxygen diffusion, perfusion, and metabolism Open
Background Hypoxia underlies or complicates a wide range of chronic conditions. Previous research suggests slower-paced exercises may develop states of relaxation, combined with enhanced respiration, which may trigger increased oxygen perf…
View article: A paradigm shift in light intensity exercise and sleep: Indications of enhanced oxygen diffusion and metabolism and implications for healing and cellular regeneration
A paradigm shift in light intensity exercise and sleep: Indications of enhanced oxygen diffusion and metabolism and implications for healing and cellular regeneration Open
Background Hypoxia underlies or complicates a wide range of chronic conditions. Light-intensity exercises may develop states of relaxation, combined with enhanced respiration, which may trigger accelerated diffusion and facilitated oxygen …
View article: Causal Attention with Lookahead Keys
Causal Attention with Lookahead Keys Open
In standard causal attention, each token's query, key, and value (QKV) are static and encode only preceding context. We introduce CAuSal aTtention with Lookahead kEys (CASTLE), an attention mechanism that continually updates each token's k…
View article: Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey Open
Large language models (LLMs) have significantly advanced the field of natural language processing (NLP), providing a highly useful, task-agnostic foundation for a wide range of applications. However, directly applying LLMs to solve sophist…
View article: The Inaugural Flatiron Institute Cryo-EM Conformational Heterogeneity Challenge
The Inaugural Flatiron Institute Cryo-EM Conformational Heterogeneity Challenge Open
Despite the rise of single particle cryo-electron microscopy (cryo-EM) as a premier method for resolving macromolecular structures at atomic resolution, methods to address molecular heterogeneity in vitrified samples have yet to reach matu…
View article: A Paradigm Shift in Walking, Sleep, and Exercise: Unique Effects on Blood Oxygen Saturation, Oxygen Diffusion, and Cellular Metabolism
A Paradigm Shift in Walking, Sleep, and Exercise: Unique Effects on Blood Oxygen Saturation, Oxygen Diffusion, and Cellular Metabolism Open
Hypoxia underlies or complicates a wide range of chronic conditions, including cancer, arthritis, chronic pain, multiple sclerosis, stroke, chronic kidney disease, diabetes and more. Research is presented supporting indications that slower…
View article: Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration
Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration Open
Imitation learning is a central problem in reinforcement learning where the goal is to learn a policy that mimics the expert's behavior. In practice, it is often challenging to learn the expert policy from a limited number of demonstration…
View article: SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving
SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving Open
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs) that closely mirrors real-world software development workflows. Unlike traditional static benchmarks, SwingArena models the collaborative process of…
View article: On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning Open
Policy gradient algorithms have been successfully applied to enhance the reasoning capabilities of large language models (LLMs). KL regularization is ubiquitous, yet the design surface, choice of KL direction (forward vs. reverse), normali…
View article: Stimulus-responsive cellulose hydrogels in biomedical applications and challenges
Stimulus-responsive cellulose hydrogels in biomedical applications and challenges Open
Stimuli-responsive cellulose hydrogels have garnered significant attention in the biomedical field owing to their extensive applications in tissue engineering and controlled drug delivery systems. Derived from cellulose and its derivatives…
View article: Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization Open
Despite deep neural networks' powerful representation learning capabilities, theoretical understanding of how networks can simultaneously achieve meaningful feature learning and global convergence remains elusive. Existing approaches like …
View article: The Potential Impact of Antibiotic Exposure on the Microbiome and Human Health
The Potential Impact of Antibiotic Exposure on the Microbiome and Human Health Open
Antibiotics are a cornerstone of modern medicine, saving countless lives. However, their widespread use presents two major challenges. First, antibiotic-induced changes in the microbiome can disrupt immune function, increasing the suscepti…
View article: CryoSTAR: Leveraging Structural Prior and Constraints for Cryo-EM Heterogeneous Reconstruction
CryoSTAR: Leveraging Structural Prior and Constraints for Cryo-EM Heterogeneous Reconstruction Open
Resolving conformational heterogeneity in cryo-electron microscopy (cryo-EM) datasets remains a significant challenge in structural biology. Previous methods have often been restricted to working exclusively on volumetric densities, neglec…
View article: Understanding SGD with Exponential Moving Average: A Case Study in Linear Regression
Understanding SGD with Exponential Moving Average: A Case Study in Linear Regression Open
Exponential moving average (EMA) has recently gained significant popularity in training modern deep learning models, especially diffusion-based generative models. However, there have been few theoretical results explaining the effectivenes…
View article: Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees Open
Reinforcement Learning from Human Feedback (RLHF) has been highly successful in aligning large language models with human preferences. While prevalent methods like DPO have demonstrated strong performance, they frame interactions with the …
View article: Logarithmic Regret for Online KL-Regularized Reinforcement Learning
Logarithmic Regret for Online KL-Regularized Reinforcement Learning Open
Recent advances in Reinforcement Learning from Human Feedback (RLHF) have shown that KL-regularization plays a pivotal role in improving the efficiency of RL fine-tuning for large language models (LLMs). Despite its empirical advantage, th…
View article: Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits
Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits Open
Although many popular reinforcement learning algorithms are underpinned by $f$-divergence regularization, their sample complexity with respect to the \emph{regularized objective} still lacks a tight characterization. In this paper, we anal…
View article: Seeing through “brain fog”: neuroimaging assessment and imaging biomarkers for cancer-related cognitive impairments
Seeing through “brain fog”: neuroimaging assessment and imaging biomarkers for cancer-related cognitive impairments Open
Advances in cancer diagnosis and treatment have substantially improved patient outcomes and survival in recent years. However, up to 75% of cancer patients and survivors, including those with non-central nervous system (non-CNS) cancers, s…
View article: MARS: Unleashing the Power of Variance Reduction for Training Large Models
MARS: Unleashing the Power of Variance Reduction for Training Large Models Open
Training deep neural networks--and more recently, large models demands efficient and scalable optimizers. Adaptive gradient algorithms like Adam, AdamW, and their variants have been central to this task. Despite the development of numerous…
View article: ProteinWeaver: A Divide-and-Assembly Approach for Protein Backbone Design
ProteinWeaver: A Divide-and-Assembly Approach for Protein Backbone Design Open
Nature creates diverse proteins through a 'divide and assembly' strategy. Inspired by this idea, we introduce ProteinWeaver, a two-stage framework for protein backbone design. Our method first generates individual protein domains and then …
View article: Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF Open
Reverse-Kullback-Leibler (KL) regularization has emerged to be a predominant technique used to enhance policy optimization in reinforcement learning (RL) and reinforcement learning from human feedback (RLHF), which forces the learned polic…
View article: CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing
CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing Open
Sequential reasoning in agent systems has been significantly advanced by large language models (LLMs), yet existing approaches face limitations. Reflection-driven reasoning relies solely on knowledge in pretrained models, limiting performa…
View article: Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers
Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers Open
Score-based diffusion models have emerged as powerful techniques for generating samples from high-dimensional data distributions. These models involve a two-phase process: first, injecting noise to transform the data distribution into a kn…
View article: DPLM-2: A Multimodal Diffusion Protein Language Model
DPLM-2: A Multimodal Diffusion Protein Language Model Open
Proteins are essential macromolecules defined by their amino acid sequences, which determine their three-dimensional structures and, consequently, their functions in all living organisms. Therefore, generative protein modeling necessitates…
View article: CryoFM: A Flow-based Foundation Model for Cryo-EM Densities
CryoFM: A Flow-based Foundation Model for Cryo-EM Densities Open
Cryo-electron microscopy (cryo-EM) is a powerful technique in structural biology and drug discovery, enabling the study of biomolecules at high resolution. Significant advancements by structural biologists using cryo-EM have led to the pro…
View article: Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization Open
Reinforcement Learning (RL) plays a crucial role in aligning large language models (LLMs) with human preferences and improving their ability to perform complex tasks. However, current approaches either require significant computational res…
View article: Accelerated Preference Optimization for Large Language Model Alignment
Accelerated Preference Optimization for Large Language Model Alignment Open
Reinforcement Learning from Human Feedback (RLHF) has emerged as a pivotal tool for aligning large language models (LLMs) with human preferences. Direct Preference Optimization (DPO), one of the most popular approaches, formulates RLHF as …
View article: Redox-Detecting Deep Learning for Mechanism Discernment in Cyclic Voltammograms of Multiple Redox Events
Redox-Detecting Deep Learning for Mechanism Discernment in Cyclic Voltammograms of Multiple Redox Events Open
In electrochemical analysis, mechanism assignment is fundamental to understanding the chemistry of a system. The detection and classification of electrochemical mechanisms in cyclic voltammetry set the foundation for subsequent quantitativ…