Danqi Chen
YOU?
Author Swipe
View article: Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting
Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting Open
Adapting language models (LMs) to new tasks via post-training carries the risk of degrading existing capabilities -- a phenomenon classically known as catastrophic forgetting. In this paper, toward identifying guidelines for mitigating thi…
View article: Extracting Rule-based Descriptions of Attention Features in Transformers
Extracting Rule-based Descriptions of Attention Features in Transformers Open
Mechanistic interpretability strives to explain model behavior in terms of bottom-up primitives. The leading paradigm is to express hidden states as a sparse linear combination of basis vectors, called features. However, this only identifi…
View article: The Model Hears You: Audio Language Model Deployments Should Consider the Principle of Least Privilege
The Model Hears You: Audio Language Model Deployments Should Consider the Principle of Least Privilege Open
The latest Audio Language Models (Audio LMs) process speech directly instead of relying on a separate transcription step. This shift preserves detailed information, such as into- nation or the presence of multiple speakers, that would othe…
View article: Sustainable valorization of tea waste by enhanced caffeine extraction via microbial-driven solid-state fermentation
Sustainable valorization of tea waste by enhanced caffeine extraction via microbial-driven solid-state fermentation Open
View article: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking Open
Recent work has identified retrieval heads, a subset of attention heads responsible for retrieving salient information in long-context language models (LMs), as measured by their copy-paste behavior in Needlein-a-Haystack tasks. In this pa…
View article: The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning Open
Reinforcement learning with verifiable rewards (RLVR) is a promising approach for training language models (LMs) on reasoning tasks that elicit emergent long chains of thought (CoTs). Unlike supervised learning, it updates the model using …
View article: The Model Hears You: Audio Language Model Deployments Should Consider the Principle of Least Privilege
The Model Hears You: Audio Language Model Deployments Should Consider the Principle of Least Privilege Open
The latest Audio Language Models (Audio LMs) process speech directly instead of relying on a separate transcription step. This shift preserves detailed information, such as intonation or the presence of multiple speakers, that would otherw…
View article: Formaldehyde Exposure Induces Systemic Epigenetic Alterations in Histone Methylation and Acetylation
Formaldehyde Exposure Induces Systemic Epigenetic Alterations in Histone Methylation and Acetylation Open
Formaldehyde (FA) is a pervasive environmental organic pollutant and a Group 1 human carcinogen. While FA has been implicated in various cancers, its genotoxic effects, including DNA damage and DNA-protein crosslinking, have proven insuffi…
View article: Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation Open
Modern language models are trained on large, unstructured datasets consisting of trillions of tokens and obtained by crawling the web. The unstructured nature makes it difficult to reason about their contents and develop systematic approac…
View article: Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving Open
We introduce Goedel-Prover, an open-source language model that achieves state-of-the-art (as of April 5 2025) performance in automated formal proof generation for mathematical problems. A key challenge in this field is the scarcity of form…
View article: Optimizing the thermostability of triketone dioxygenase for engineering tolerance to mesotrione herbicide in soybean and cotton
Optimizing the thermostability of triketone dioxygenase for engineering tolerance to mesotrione herbicide in soybean and cotton Open
Optimized triketone dioxygenase (TDO) variants with enhanced temperature stability parameters were engineered to enable robust triketone tolerance in transgenic cotton and soybean crops. This herbicide tolerance trait, which can metabolize…
View article: Metadata Conditioning Accelerates Language Model Pre-training
Metadata Conditioning Accelerates Language Model Pre-training Open
The vast diversity of styles, domains, and quality levels present in language model pre-training corpora is essential in developing general model capabilities, but efficiently learning and deploying the correct behaviors exemplified in eac…
View article: LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation Open
Existing benchmarks for evaluating long-context language models (LCLMs) primarily focus on long-context recall, requiring models to produce short responses based on a few critical snippets while processing thousands of irrelevant tokens. W…
View article: Metadata Conditioning Accelerates Language Model Pre-training
Metadata Conditioning Accelerates Language Model Pre-training Open
The vast diversity of styles, domains, and quality levels present in language model pre-training corpora is essential in developing general model capabilities, but efficiently learning and deploying the correct behaviors exemplified in eac…
View article: Representing Rule-based Chatbots with Transformers
Representing Rule-based Chatbots with Transformers Open
View article: How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively) Open
View article: Industrial Robots and Migrants’ Settlement Intention in Cities: A Study on China
Industrial Robots and Migrants’ Settlement Intention in Cities: A Study on China Open
View article: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking Open
View article: Continual Memorization of Factoids in Language Models
Continual Memorization of Factoids in Language Models Open
As new knowledge rapidly accumulates, language models (LMs) with pretrained knowledge quickly become obsolete. A common approach to updating LMs is fine-tuning them directly on new knowledge. However, recent studies have shown that fine-tu…
View article: Efficacy and Safety of DPP-4 Inhibitors and Metformin Combinations in Type 2 Diabetes: A Systematic Literature Review and Network Meta-Analysis [Corrigendum]
Efficacy and Safety of DPP-4 Inhibitors and Metformin Combinations in Type 2 Diabetes: A Systematic Literature Review and Network Meta-Analysis [Corrigendum] Open
[This corrects the article DOI: 10.2147/DMSO.S450994.].
View article: Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization Open
Direct Preference Optimization (DPO) and its variants are increasingly used for aligning language models with human preferences. Although these methods are designed to teach a model to generate preferred responses more frequently relative …
View article: HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly Open
Many benchmarks exist for evaluating long-context language models (LCLMs), yet developers often rely on synthetic tasks such as needle-in-a-haystack (NIAH) or an arbitrary subset of tasks. However, it remains unclear whether these benchmar…
View article: How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively) Open
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information. We first establish a reliable evaluation protocol to guide model development -- instead of perplexity …
View article: Diagnosis of ecological security and the spatial heterogeneity of its driving factors in the mining-impacted watershed, based on ecosystem health-risk-services framework
Diagnosis of ecological security and the spatial heterogeneity of its driving factors in the mining-impacted watershed, based on ecosystem health-risk-services framework Open
A comprehensive diagnosis of ecological security (ES) and its driving mechanisms in the watershed under mining influence is essential for the conservation and restoration of watershed ecosystems. Few studies have comprehensively evaluated …
View article: Parental perceptions and experiences of kangaroo care for preterm infants in neonatal intensive care units in China: a qualitative study
Parental perceptions and experiences of kangaroo care for preterm infants in neonatal intensive care units in China: a qualitative study Open
View article: BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval Open
Existing retrieval benchmarks primarily consist of information-seeking queries (e.g., aggregated questions from search engines) where keyword or semantic-based retrieval is usually sufficient. However, many complex real-world queries requi…
View article: Representing Rule-based Chatbots with Transformers
Representing Rule-based Chatbots with Transformers Open
What kind of internal mechanisms might Transformers use to conduct fluid, natural-sounding conversations? Prior work has illustrated by construction how Transformers can solve various synthetic tasks, such as sorting a list or recognizing …
View article: LitSearch: A Retrieval Benchmark for Scientific Literature Search
LitSearch: A Retrieval Benchmark for Scientific Literature Search Open
Literature search questions, such as "Where can I find research on the evaluation of consistency in generated summaries?" pose significant challenges for modern search engines and retrieval systems. These questions often require a deep und…
View article: Healthcare providers' perceptions and experiences of kangaroo mother care for preterm infants in four neonatal intensive care units in China: a qualitative descriptive study
Healthcare providers' perceptions and experiences of kangaroo mother care for preterm infants in four neonatal intensive care units in China: a qualitative descriptive study Open
Background Kangaroo mother care (KMC) is an evidence-based intervention that can effectively reduce morbidity and mortality in preterm infants, but it has yet to be widely implemented in health systems in China. Most qualitative studies on…
View article: Remodeling tumor‐associated macrophage for anti‐cancer effects by rational design of irreversible inhibition of mitogen‐activated protein kinase‐activated protein kinase 2
Remodeling tumor‐associated macrophage for anti‐cancer effects by rational design of irreversible inhibition of mitogen‐activated protein kinase‐activated protein kinase 2 Open
Mitogen‐activated protein kinase‐activated protein kinase 2 (MK2) emerges as a pivotal target in developing anti‐cancer therapies. The limitations of ATP‐competitive inhibitors, due to insufficient potency and selectivity, underscore the u…