Explanipedia

BuilderBench -- A benchmark for generalist agents Open

Raj Ghugare, C. S. Ji, Kathryn Wantlin, Jin Schofield, Benjamin Eysenbach · 2025

Today's AI models learn primarily through mimicry and sharpening, so it is not surprising that they struggle to solve problems beyond the limits set by existing data. To solve novel problems, agents should acquire skills for exploring and …

Self-Supervised Goal-Reaching Results in Multi-Agent Cooperation and Exploration Open

Chirayu Nimonkar, Shlok Shah, C. S. Ji, Benjamin Eysenbach · 2025

For groups of autonomous agents to achieve a particular goal, they must engage in coordination and long-horizon reasoning. However, designing reward functions to elicit such behavior is challenging. In this paper, we study how self-supervi…

Contrastive Representations for Temporal Reasoning Open

Alicja Ziarko, Michał Bortkiewicz, Michał Zawalski, Benjamin Eysenbach, Piotr Miłoś · 2025

In classical AI, perception relies on learning state-based representations, while planning, which can be thought of as temporal reasoning over action sequences, is typically achieved through search. We study whether such reasoning can inst…

Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning Open

Siyuan Guo, Benjamin Eysenbach, Bernhard Schölkopf, Wieland Brendel · 2025

Self-supervised feature learning and pretraining methods in reinforcement learning (RL) often rely on information-theoretic principles, termed mutual information skill learning (MISL). These methods aim to learn a representation of the env…

Intention-Conditioned Flow Occupancy Models Open

Chongyi Zheng, Seohong Park, Sergey Levine, Benjamin Eysenbach · 2025

Large-scale pre-training has fundamentally changed how machine learning research is done today: large foundation models are trained once, and then can be used by anyone in the community (including those without data or compute resources to…

Horizon Reduction Makes RL Scalable Open

Seohong Park, Kevin Frans, David C. Mann, Benjamin Eysenbach, Aviral Kumar , et al. · 2025

In this work, we study the scalability of offline reinforcement learning (RL) algorithms. In principle, a truly scalable offline RL algorithm should be able to solve any given problem, regardless of its complexity, given sufficient data, c…

Normalizing Flows are Capable Models for RL Open

Benjamin Eysenbach · 2025

Modern reinforcement learning (RL) algorithms have found success by using powerful probabilistic models, such as transformers, energy-based models, and diffusion/flow-based models. To this end, RL researchers often choose to pay the price …

The "Law" of the Unconscious Contrastive Learner: Probabilistic Alignment of Unpaired Modalities Open

Yunlong Che, Benjamin Eysenbach · 2025

While internet-scale data often comes in pairs (e.g., audio/image, image/text), we often want to perform inferences over modalities unseen together in the training data (e.g., audio/text). Empirically, this can often be addressed by learni…

Horizon Generalization in Reinforcement Learning Open

Vivek Myers, C. S. Ji, Benjamin Eysenbach · 2025

Psychology Computer science Mathematics

We study goal-conditioned RL through the lens of generalization, but not in the traditional sense of random augmentations and domain randomization. Rather, we aim to learn goal-directed policies that generalize with respect to the horizon:…

Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning Open

Chongyi Zheng, Jens Tuyls, Joanne Peng, Benjamin Eysenbach · 2024

Computer science Psychology Business

Self-supervised learning has the potential of lifting several of the key challenges in reinforcement learning today, such as exploration, representation learning, and reward design. Recent work (METRA) has effectively argued that moving aw…

Learning to Assist Humans without Inferring Rewards Open

Vivek Myers, Evan Ellis, Sergey Levine, Benjamin Eysenbach, Anca D. Dragan · 2024

Computer science Psychology

Assistive agents should make humans' lives easier. Classically, such assistance is studied through the lens of inverse reinforcement learning, where an assistive agent (e.g., a chatbot, a robot) infers a human's intention and then selects …

GHIL-Glue: Hierarchical Control with Filtered Subgoal Images Open

Kyle Hatch, Ashwin Balakrishna, Oier Mees, Suraj Nair, Seohong Park , et al. · 2024

Computer science Mathematics Materials science

Image and video generative models that are pre-trained on Internet-scale data can greatly increase the generalization capacity of robot learning systems. These models can function as high-level planners, generating intermediate subgoals fo…

OGBench: Benchmarking Offline Goal-Conditioned RL Open

Seohong Park, Kevin Frans, Benjamin Eysenbach, Sergey Levine · 2024

Computer science Psychology Business

Offline goal-conditioned reinforcement learning (GCRL) is a major problem in reinforcement learning (RL) because it provides a simple, unsupervised, and domain-agnostic way to acquire diverse behaviors and representations from unlabeled da…

Accelerating Goal-Conditioned RL Algorithms and Research Open

Michał Bortkiewicz, Władek Pałucki, Vivek Myers, Tadeusz Dziarmaga, Tomasz Arczewski , et al. · 2024

Computer science

Self-supervision has the potential to transform reinforcement learning (RL), paralleling the breakthroughs it has enabled in other areas of machine learning. While self-supervised learning in other domains aims to find patterns in a fixed …

A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals Open

Grace Liu, Michael Tang, Benjamin Eysenbach · 2024

Computer science

In this paper, we present empirical evidence of skills and directed exploration emerging from a simple RL algorithm long before any successful trials are observed. For example, in a manipulation task, the agent is given a single observatio…

Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making Open

Vivek Myers, Chongyi Zheng, Anca D. Dragan, Sergey Levine, Benjamin Eysenbach · 2024

Computer science Mathematics Engineering

Temporal distances lie at the heart of many algorithms for planning, control, and reinforcement learning that involve reaching goals, allowing one to estimate the transit time between two states. However, prior attempts to define such temp…

Inference via Interpolation: Contrastive Representations Provably Enable Planning and Inference Open

Benjamin Eysenbach, Vivek Myers, Ruslan Salakhutdinov, Sergey Levine · 2024

Computer science

Given time series data, how can we answer questions like "what will happen in the future?" and "how did we get here?" These sorts of probabilistic inference questions are challenging when observations are high-dimensional. In this paper, w…

Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View Open

Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach · 2024

Computer science Mathematics Philosophy

Some reinforcement learning (RL) algorithms can stitch pieces of experience to solve a task never seen before during training. This oft-sought property is one of the few ways in which RL methods based on dynamic-programming differ from RL …

Bridging State and History Representations: Understanding Self-Predictive RL Open

Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring , et al. · 2024

Computer science

Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Many representation learning methods and theoretical fra…

Learning to Reach Goals via Iterated Supervised Learning Open

Dibya Ghosh, Abhishek Gupta, Ashwin Reddy, Justin Fu, Coline Devin , et al. · 2024

Computer science Mathematics Geography

Current reinforcement learning (RL) algorithms can be brittle and difficult to use, especially when learning goal-reaching behaviors from sparse rewards. Although supervised imitation learning provides a simple and stable alternative, it r…

Contrastive Difference Predictive Coding Open

Chongyi Zheng, Ruslan Salakhutdinov, Benjamin Eysenbach · 2023

Computer science Mathematics Chemistry

Predicting and reasoning about the future lie at the heart of many time-series questions. For example, goal-conditioned reinforcement learning can be viewed as learning representations to predict which states are likely to be visited in th…

A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning Open

Benjamin Eysenbach, Matthieu Geist, Sergey Levine, Ruslan Salakhutdinov · 2023

Computer science Mathematics

As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting. One-step methods perform regularization by doing just a single step of policy improvement, while c…

Contrastive Example-Based Control Open

Kyle Hatch, Benjamin Eysenbach, Rafael Rafailov, Tianhe Yu, Ruslan Salakhutdinov , et al. · 2023

Computer science Chemistry Materials science

While many real-world problems that might benefit from reinforcement learning, these problems rarely fit into the MDP mold: interacting with the environment is often expensive and specifying reward functions is challenging. Motivated by th…

HIQL: Offline Goal-Conditioned RL with Latent States as Actions Open

Seohong Park, Dibya Ghosh, Benjamin Eysenbach, Sergey Levine · 2023

Computer science Mathematics Political science

Unsupervised pre-training has recently become the bedrock for computer vision and natural language processing. In reinforcement learning (RL), goal-conditioned RL can potentially provide an analogous self-supervised approach for making use…

When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment Open

Tianwei Ni, Michel Ma, Benjamin Eysenbach, Pierre‐Luc Bacon · 2023

Computer science Engineering Mathematics

Reinforcement learning (RL) algorithms face two distinct challenges: learning effective representations of past and present observations, and determining how actions influence future returns. Both challenges involve modeling long-term depe…

Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data Open

Chongyi Zheng, Benjamin Eysenbach, Homer Walke, Patrick Yin, Kuan Fang , et al. · 2023

Computer science Engineering

Robotic systems that rely primarily on self-supervised learning have the potential to decrease the amount of human annotation and engineering effort required to learn control strategies. In the same way that prior robotic systems have leve…

Bitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group Shifts Open

Amrith Setlur, Don Dennis, Benjamin Eysenbach, Aditi Raghunathan, Chelsea Finn , et al. · 2023

Computer science Mathematics Chemistry

Training machine learning models robust to distribution shifts is critical for real-world applications. Some robust training algorithms (e.g., Group DRO) specialize to group shifts and require group information on all training points. Othe…

Learning Options via Compression Open

Yiding Jiang, Evan Zheran Liu, Benjamin Eysenbach, J. Zico Kolter, Chelsea Finn · 2022

Computer science Mathematics Physics

Identifying statistical regularities in solutions to some tasks in multi-task reinforcement learning can accelerate the learning of new tasks. Skill learning offers one way of identifying these regularities by decomposing pre-collected exp…

Contrastive Value Learning: Implicit Models for Simple Offline RL Open

Bogdan Mazoure, Benjamin Eysenbach, Ofir Nachum, Jonathan Tompson · 2022

Computer science Psychology Philosophy

Model-based reinforcement learning (RL) methods are appealing in the offline setting because they allow an agent to reason about the consequences of actions without interacting with the environment. Prior methods learn a 1-step dynamics mo…

Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective Open

Raj Ghugare, Homanga Bharadhwaj, Benjamin Eysenbach, Sergey Levine, Ruslan Salakhutdinov · 2022

Computer science Mathematics Chemistry

While reinforcement learning (RL) methods that learn an internal model of the environment have the potential to be more sample efficient than their model-free counterparts, learning to model raw observations from high dimensional sensors c…

Benjamin Eysenbach YOU? Author Swipe