Explanipedia

Optimal Multi-Distribution Learning Open

Zihan Zhang, Wenhao Zhan, Yuxin Chen, Simon S. Du, Jason D. Lee · 2025

Computer science Mathematics

Multi-distribution learning (MDL), which seeks to learn a shared model that minimizes the worst-case risk across k distinct data distributions, has emerged as a unified framework in response to the evolving demand for robustness, fairness,…

Policy-Based Trajectory Clustering in Offline Reinforcement Learning Open

Hao Hu, Xinqi Wang, Simon S. Du · 2025

We introduce a novel task of clustering trajectories from offline reinforcement learning (RL) datasets, where each cluster center represents the policy that generated its trajectories. By leveraging the connection between the KL-divergence…

Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO Open

Ruizhe Shi, Min‐Kyu Song, Runlong Zhou, Zihan Zhang, Maryam Fazel , et al. · 2025

We present a fine-grained theoretical analysis of the performance gap between reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO) under a representation gap. Our study decomposes this gap into two sou…

Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval Open

Siting Li, Xiang Gao, Simon S. Du · 2025

While an image is worth more than a thousand words, only a few provide crucial information for a given task and thus should be focused on. In light of this, ideal text-to-image (T2I) retrievers should prioritize specific visual attributes …

Settling the Sample Complexity of Online Reinforcement Learning Open

Zihan Zhang, Yuxin Chen, Jason D. Lee, Simon S. Du · 2025

Computer science Psychology Engineering

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a “large…

Improving Human-AI Coordination through Online Adversarial Training and Generative Models Open

Paresh Chaudhary, Yancheng Liang, Daphne Chen, Simon S. Du, Natasha Jaques · 2025

Being able to cooperate with diverse humans is an important component of many economically valuable AI tasks, from household robotics to autonomous driving. However, generalizing to novel humans requires training on data that captures the …

Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination Open

Kunal Jha, Yancheng Liang, Simon S. Du, Natasha Jaques · 2025

Zero-shot coordination (ZSC), the ability to adapt to a new partner in a cooperative task, is a critical component of human-compatible AI. While prior work has focused on training agents to cooperate on a single task, these specialized mod…

Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback Open

Runlong Zhou, Maryam Fazel, Simon S. Du · 2025

Reinforcement learning from human feedback (RLHF) has become essential for improving language model capabilities, but traditional approaches rely on the assumption that human preferences follow a transitive Bradley-Terry model. This assump…

A Minimalist Example of Edge-of-Stability and Progressive Sharpening Open

Liming Liu, Zixuan Zhang, Simon S. Du, Tuo Zhao · 2025

Recent advances in deep learning optimization have unveiled two intriguing phenomena under large learning rates: Edge of Stability (EoS) and Progressive Sharpening (PS), challenging classical Gradient Descent (GD) analyses. Current researc…

SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters Open

Yiping Wang, Hanxian Huang, Yifang Chen, Jishen Zhao, Simon S. Du , et al. · 2025

Computer science

While Large language models (LLMs) have advanced natural language processing tasks, their growing computational and memory demands make deployment on resource-constrained devices like mobile phones increasingly challenging. In this paper, …

Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation Open

Yiping Wang, Xuehai He, Kuan Wang, Luyao Ma, Jianwei Yang , et al. · 2024

Computer science Geography

The current state-of-the-art video generative models can produce commercial-grade videos with highly realistic details. However, they still struggle to coherently present multiple sequential events in the stories specified by the prompts, …

Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration Open

Avinandan Bose, Zhihan Xiong, Aadirupa Saha, Simon S. Du, Maryam Fazel · 2024

Computer science Mathematics Economics

Reinforcement Learning from Human Feedback (RLHF) is currently the leading approach for aligning large language models with human preferences. Typically, these models rely on extensive offline preference datasets for training. However, off…

Anytime Acceleration of Gradient Descent Open

Zihan Zhang, Jason D. Lee, Simon S. Du, Yuxin Chen · 2024

Computer science Physics

This work investigates stepsize-based acceleration of gradient descent with {\em anytime} convergence guarantees. For smooth (non-strongly) convex optimization, we propose a stepsize schedule that allows gradient descent to achieve converg…

Learning to Cooperate with Humans using Generative Agents Open

Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon S. Du, Natasha Jaques · 2024

Computer science Psychology

Training agents that can coordinate zero-shot with humans is a key mission in multi-agent reinforcement learning (MARL). Current algorithms focus on training simulated human partner policies which are then used to train a Cooperator agent.…

Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder Open

Siting Li, Pang Wei Koh, Simon S. Du · 2024

Computer science Business

Recent research has shown that CLIP models struggle with visual reasoning tasks that require grounding compositionality, understanding spatial relationships, or capturing fine-grained details. One natural hypothesis is that the CLIP vision…

Transformers are Efficient Compilers, Provably Open

X. Y. Zhai, Renjun Zhou, Liao Zhang, Simon S. Du · 2024

Computer science Engineering

Transformer-based large language models (LLMs) have demonstrated surprisingly robust performance across a wide range of language-related tasks, including programming language understanding and generation. In this paper, we take the first s…

Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques Open

N Zhang, Xinqi Wang, Qiwen Cui, Runlong Zhou, Sham M. Kakade , et al. · 2024

Computer science Psychology

We initiate the study of Preference-Based Multi-Agent Reinforcement Learning (PbMARL), exploring both theoretical foundations and empirical validations. We define the task as identifying the Nash equilibrium from a preference-only offline …

Understanding the Gains from Repeated Self-Distillation Open

Divyansh Pareek, Simon S. Du, Sewoong Oh · 2024

Computer science Environmental science Economics

Self-Distillation is a special type of knowledge distillation where the student model has the same architecture as the teacher model. Despite using the same architecture and the same training data, self-distillation has been empirically ob…

Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning Open

Chen Yi-fang, Shuohang Wang, Ziyi Yang, Hiteshi Sharma, Nikos Karampatziakis , et al. · 2024

Business Computer science Economics

Reinforcement learning with human feedback (RLHF), as a widely adopted approach in current large language model pipelines, is \textit{bottlenecked by the size of human preference data}. While traditional methods rely on offline preference …

Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models Open

Weihang Xu, Maryam Fazel, Simon S. Du · 2024

Mathematics Computer science Physics

We study the gradient Expectation-Maximization (EM) algorithm for Gaussian Mixture Models (GMM) in the over-parameterized setting, where a general GMM with $n>1$ components learns from data that are generated by a single ground truth Gauss…

Decoding-Time Language Model Alignment with Multiple Objectives Open

Ruizhe Shi, Yifang Chen, Yushi Hu, Alisa Liu, Noah A. Smith , et al. · 2024

Computer science

Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting thei…

CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning Open

Yiping Wang, Yifang Chen, Wendan Yan, Alex Chengyu Fang, Wenjing Zhou , et al. · 2024

Computer science Mathematics Philosophy

Data selection has emerged as a core issue for large-scale visual-language model pretaining (e.g., CLIP), particularly with noisy web-curated datasets. Three main data selection approaches are: (1) leveraging external non-CLIP models to ai…

Rethinking Transformers in Solving POMDPs Open

Chenhao Lu, Ruizhe Shi, Yuyao Liu, Kaizhe Hu, Simon S. Du , et al. · 2024

Computer science Engineering

Sequential decision-making algorithms such as reinforcement learning (RL) in real-world scenarios inevitably face environments with partial observability. This paper scrutinizes the effectiveness of a popular architecture, namely Transform…

Horizon-Free Regret for Linear Markov Decision Processes Open

Zihan Zhang, Jason D. Lee, Yuxin Chen, Simon S. Du · 2024

Mathematics Computer science

A recent line of works showed regret bounds in reinforcement learning (RL) can be (nearly) independent of planning horizon, a.k.a.~the horizon-free bounds. However, these regret bounds only apply to settings where a polynomial dependency o…

Distributional Successor Features Enable Zero-Shot Policy Optimization Open

Chuning Zhu, Xinqi Wang, Tyler Han, Simon S. Du, Abhishek Gupta · 2024

Computer science Psychology Engineering

Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through…

Reflect-RL: Two-Player Online RL Fine-Tuning for LMs Open

Runlong Zhou, Simon S. Du, Beibin Li · 2024

Computer science Physics

As language models (LMs) demonstrate their capabilities in various fields, their application to tasks requiring multi-round interactions has become increasingly popular. These tasks usually have complex dynamics, so supervised fine-tuning …

Offline Multi-task Transfer RL with Representational Penalization Open

Avinandan Bose, Simon S. Du, Maryam Fazel · 2024

Computer science Economics

We study the problem of representation transfer in offline Reinforcement Learning (RL), where a learner has access to episodic data from a number of source tasks collected a priori, and aims to learn a shared representation to be used in f…

Learning Optimal Tax Design in Nonatomic Congestion Games Open

Qiwen Cui, Maryam Fazel, Simon S. Du · 2024

Computer science Economics Mathematics

In multiplayer games, self-interested behavior among the players can harm the social welfare. Tax mechanisms are a common method to alleviate this issue and induce socially optimal behavior. In this work, we take the initial step of learni…

Refined Sample Complexity for Markov Games with Independent Linear Function Approximation Open

Yan Dai, Qiwen Cui, Simon S. Du · 2024

Mathematics Computer science Physics

Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL). It was long believed that the "curse of multi-agents" (i.e., the algorithmic performance drops exponentially with the number of agents) is unavoidable u…

Variance Alignment Score: A Simple But Tough-to-Beat Data Selection Method for Multimodal Contrastive Learning Open

Yiping Wang, Yifang Chen, Wendan Yan, Kevin Jamieson, Simon S. Du · 2024

Computer science Business Physics

In recent years, data selection has emerged as a core issue for large-scale visual-language model pretraining, especially on noisy web-curated datasets. One widely adopted strategy assigns quality scores such as CLIP similarity for each sa…

Simon S. Du YOU? Author Swipe