Explanipedia

Depression as a disorder of distributional coding Open

Matthew Botvinick, Zeb Kurth‐Nelson, Timothy Müller, Will Dabney · 2025

Major depressive disorder persistently stands as a major public health problem. While some progress has been made toward effective treatments, the neural mechanisms that give rise to the disorder remain poorly understood. In this Perspecti…

Uncertainty Prioritized Experience Replay Open

Rodrigo Carrasco-Davis, Sebastian Lee, Claudia Clopath, Will Dabney · 2025

Prioritized experience replay, which improves sample efficiency by selecting relevant transitions to update parameter estimates, is a crucial component of contemporary value-based deep reinforcement learning models. Typically, transitions …

Plasticity as the Mirror of Empowerment Open

David Abel, Michael Bowling, André Barreto, Will Dabney, Shi Dong , et al. · 2025

Agents are minimally entities that are influenced by their past observations and act to influence future observations. This latter capacity is captured by empowerment, which has served as a vital framing concept across artificial intellige…

Agency Is Frame-Dependent Open

David Abel, André Barreto, Michael Bowling, Will Dabney, Shi Dong , et al. · 2025

Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science, and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult q…

Discovering Symbolic Cognitive Models from Human and Animal Behavior Open

Pablo Samuel Castro, Nenad Tomašev, Ankit Anand, Nicole Sharma, Rishika Mohanta , et al. · 2025

Computer science Psychology Economics

Symbolic models play a key role in cognitive science, expressing computationally precise hypotheses about how the brain implements a cognitive process. Identifying an appropriate model typically requires a great deal of effort and ingenuit…

Optimizing Return Distributions with Distributional Dynamic Programming Open

Bernardo Ávila Pires, Mark Rowland, Diana Borsa, Zhaohan Daniel Guo, Khimya Khetarpal , et al. · 2025

Computer science Economics Mathematics

We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution, with standard reinforcement learning as a special case. Previous distributional DP methods could optimize the s…

Lifelong Reinforcement Learning via Neuromodulation Open

Sebastian Lee, Samuel Liebana Garcia, Claudia Clopath, Will Dabney · 2024

Psychology Computer science

Navigating multiple tasks$\unicode{x2014}$for instance in succession as in continual or lifelong learning, or in distributions as in meta or multi-task learning$\unicode{x2014}$requires some notion of adaptation. Evolution over timescales …

Normalization and effective learning rates in reinforcement learning Open

Clare Lyle, Zeyu Zheng, Khimya Khetarpal, James Martens, Hado van Hasselt , et al. · 2024

Computer science Psychology Sociology

Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature, with several works highlighting diverse benefits such as improving loss landscape conditioning and combattin…

A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning Open

Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Ávila Pires, Yunhao Tang, Clare Lyle , et al. · 2024

Computer science Psychology Physics

Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Self-predictive learning provides means to jointly learn a latent representation and dynamics model by bootstrapping from future latent represent…

Understanding the performance gap between online and offline alignment algorithms Open

Yunhao Tang, Daniel Guo, Zeyu Zheng, Daniele Calandriello, Yuan Cao , et al. · 2024

Computer science

Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline alignment algorithms challenge the need for on-policy sampling in RLHF. Within the conte…

A Distributional Analogue to the Successor Representation Open

Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto , et al. · 2024

Computer science Mathematics Political science

This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes th…

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model Open

Mark Rowland, Li Kevin Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang , et al. · 2024

Computer science Mathematics Psychology

We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions with a generative model (up to logarithmic factors), resolving an open quest…

Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling Open

Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney · 2024

Mathematics Economics Computer science

We introduce off-policy distributional Q($λ$), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q($λ$) does not apply importance sampling for off-policy learning, which introduces i…

Distributional Reinforcement Learning with Quantile Regression Open

Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos · 2024

Computer science Mathematics Psychology

In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the obs…

Bootstrapped Representations in Reinforcement Learning Open

Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal , et al. · 2023

Computer science Mathematics Engineering

In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try t…

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation Open

Mark Rowland, Yunhao Tang, Clare Lyle, Rémi Munos, Marc G. Bellemare , et al. · 2023

Computer science Mathematics Economics

We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this ta…

Deep Reinforcement Learning with Plasticity Injection Open

Evgenii Nikishin, Junhyuk Oh, Georg Ostrovski, Clare Lyle, Razvan Pascanu , et al. · 2023

Computer science Psychology Materials science

A growing body of evidence suggests that neural networks employed in deep reinforcement learning (RL) gradually lose their plasticity, the ability to learn from new data; however, the analysis and mitigation of this phenomenon is hampered …

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition Open

Yash Chandak, Shantanu Thakoor, Zhaohan Daniel Guo, Yunhao Tang, Rémi Munos , et al. · 2023

Computer science Political science Physics

Representation learning and exploration are among the key challenges for any deep reinforcement learning agent. In this work, we provide a singular value decomposition based method that can be used to obtain representations that preserve t…

Understanding plasticity in neural networks Open

Clare Lyle, Zeyu Zheng, Evgenii Nikishin, Bernardo Ávila Pires, Razvan Pascanu , et al. · 2023

Computer science Biology Physics

Plasticity, the ability of a neural network to quickly change its predictions in response to new information, is essential for the adaptability and robustness of deep reinforcement learning systems. Deep neural networks are known to lose p…

Distributional Reinforcement Learning Open

Marc G. Bellemare, Will Dabney, Mark Rowland · 2023

Computer science Mathematics Psychology

The first comprehensive guide to distributional reinforcement learning, providing a new mathematical formalism for thinking about decisions from a probabilistic perspective. Distributional reinforcement learning is a new mathematical forma…

An Analysis of Quantile Temporal-Difference Learning Open

Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski , et al. · 2023

Computer science Mathematics Economics

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these empiric…

Settling the Reward Hypothesis Open

Michael Bowling, John D. Martin, David Abel, Will Dabney · 2022

Psychology Mathematics Philosophy

The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)." We aim to fully settle this hypothesis.…

Understanding Self-Predictive Learning for Reinforcement Learning Open

Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak , et al. · 2022

Computer science Chemistry Philosophy

We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empi…

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning Open

Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney , et al. · 2022

Computer science Mathematics Geography

We study the multi-step off-policy learning approach to distributional RL. Despite the apparent similarity between value-based RL and distributional RL, our study reveals intriguing and fundamental differences between the two cases in the …

On the Expressivity of Markov Reward (Extended Abstract) Open

David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman , et al. · 2022

Computer science Mathematics Engineering

Reward is the driving force for reinforcement-learning agents. We here set out to understand the expressivity of Markov reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract …

Generalised Policy Improvement with Geometric Policy Composition Open

Shantanu Thakoor, Mark Rowland, Diana Borsa, Will Dabney, Rémi Munos , et al. · 2022

Computer science Mathematics Economics

We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL. The new method builds on the concept of a geome…

Learning Dynamics and Generalization in Reinforcement Learning Open

Clare Lyle, Mark Rowland, Will Dabney, Marta Kwiatkowska, Yarin Gal · 2022

Computer science Mathematics Biology

Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal differ…

Understanding and Preventing Capacity Loss in Reinforcement Learning Open

Clare Lyle, Mark Rowland, Will Dabney · 2022

Computer science Engineering Philosophy

The reinforcement learning (RL) problem is rife with sources of non-stationarity, making it a notoriously difficult problem domain for the application of neural networks. We identify a mechanism by which non-stationary prediction targets c…

On the Expressivity of Markov Reward Open

David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman , et al. · 2021

Computer science Psychology Mathematics

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstr…

Will Dabney YOU? Author Swipe