Will Dabney
YOU?
Author Swipe
View article: Depression as a disorder of distributional coding
Depression as a disorder of distributional coding Open
Major depressive disorder persistently stands as a major public health problem. While some progress has been made toward effective treatments, the neural mechanisms that give rise to the disorder remain poorly understood. In this Perspecti…
View article: Uncertainty Prioritized Experience Replay
Uncertainty Prioritized Experience Replay Open
Prioritized experience replay, which improves sample efficiency by selecting relevant transitions to update parameter estimates, is a crucial component of contemporary value-based deep reinforcement learning models. Typically, transitions …
View article: Plasticity as the Mirror of Empowerment
Plasticity as the Mirror of Empowerment Open
Agents are minimally entities that are influenced by their past observations and act to influence future observations. This latter capacity is captured by empowerment, which has served as a vital framing concept across artificial intellige…
View article: Agency Is Frame-Dependent
Agency Is Frame-Dependent Open
Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science, and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult q…
View article: Discovering Symbolic Cognitive Models from Human and Animal Behavior
Discovering Symbolic Cognitive Models from Human and Animal Behavior Open
Symbolic models play a key role in cognitive science, expressing computationally precise hypotheses about how the brain implements a cognitive process. Identifying an appropriate model typically requires a great deal of effort and ingenuit…
View article: Optimizing Return Distributions with Distributional Dynamic Programming
Optimizing Return Distributions with Distributional Dynamic Programming Open
We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution, with standard reinforcement learning as a special case. Previous distributional DP methods could optimize the s…
View article: Lifelong Reinforcement Learning via Neuromodulation
Lifelong Reinforcement Learning via Neuromodulation Open
Navigating multiple tasks$\unicode{x2014}$for instance in succession as in continual or lifelong learning, or in distributions as in meta or multi-task learning$\unicode{x2014}$requires some notion of adaptation. Evolution over timescales …
View article: Normalization and effective learning rates in reinforcement learning
Normalization and effective learning rates in reinforcement learning Open
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature, with several works highlighting diverse benefits such as improving loss landscape conditioning and combattin…
View article: A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning Open
Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Self-predictive learning provides means to jointly learn a latent representation and dynamics model by bootstrapping from future latent represent…
View article: Understanding the performance gap between online and offline alignment algorithms
Understanding the performance gap between online and offline alignment algorithms Open
Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline alignment algorithms challenge the need for on-policy sampling in RLHF. Within the conte…
View article: A Distributional Analogue to the Successor Representation
A Distributional Analogue to the Successor Representation Open
This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes th…
View article: Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model Open
We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions with a generative model (up to logarithmic factors), resolving an open quest…
View article: Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling
Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling Open
We introduce off-policy distributional Q($λ$), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q($λ$) does not apply importance sampling for off-policy learning, which introduces i…
View article: Distributional Reinforcement Learning with Quantile Regression
Distributional Reinforcement Learning with Quantile Regression Open
In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the obs…
View article: Bootstrapped Representations in Reinforcement Learning
Bootstrapped Representations in Reinforcement Learning Open
In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try t…
View article: The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation Open
We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this ta…
View article: Deep Reinforcement Learning with Plasticity Injection
Deep Reinforcement Learning with Plasticity Injection Open
A growing body of evidence suggests that neural networks employed in deep reinforcement learning (RL) gradually lose their plasticity, the ability to learn from new data; however, the analysis and mitigation of this phenomenon is hampered …
View article: Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition
Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition Open
Representation learning and exploration are among the key challenges for any deep reinforcement learning agent. In this work, we provide a singular value decomposition based method that can be used to obtain representations that preserve t…
View article: Understanding plasticity in neural networks
Understanding plasticity in neural networks Open
Plasticity, the ability of a neural network to quickly change its predictions in response to new information, is essential for the adaptability and robustness of deep reinforcement learning systems. Deep neural networks are known to lose p…
View article: Distributional Reinforcement Learning
Distributional Reinforcement Learning Open
The first comprehensive guide to distributional reinforcement learning, providing a new mathematical formalism for thinking about decisions from a probabilistic perspective. Distributional reinforcement learning is a new mathematical forma…
View article: An Analysis of Quantile Temporal-Difference Learning
An Analysis of Quantile Temporal-Difference Learning Open
We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these empiric…
View article: Settling the Reward Hypothesis
Settling the Reward Hypothesis Open
The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)." We aim to fully settle this hypothesis.…
View article: Understanding Self-Predictive Learning for Reinforcement Learning
Understanding Self-Predictive Learning for Reinforcement Learning Open
We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empi…
View article: The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning Open
We study the multi-step off-policy learning approach to distributional RL. Despite the apparent similarity between value-based RL and distributional RL, our study reveals intriguing and fundamental differences between the two cases in the …
View article: On the Expressivity of Markov Reward (Extended Abstract)
On the Expressivity of Markov Reward (Extended Abstract) Open
Reward is the driving force for reinforcement-learning agents. We here set out to understand the expressivity of Markov reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract …
View article: Generalised Policy Improvement with Geometric Policy Composition
Generalised Policy Improvement with Geometric Policy Composition Open
We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL. The new method builds on the concept of a geome…
View article: Learning Dynamics and Generalization in Reinforcement Learning
Learning Dynamics and Generalization in Reinforcement Learning Open
Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal differ…
View article: Understanding and Preventing Capacity Loss in Reinforcement Learning
Understanding and Preventing Capacity Loss in Reinforcement Learning Open
The reinforcement learning (RL) problem is rife with sources of non-stationarity, making it a notoriously difficult problem domain for the application of neural networks. We identify a mechanism by which non-stationary prediction targets c…
View article: On the Expressivity of Markov Reward
On the Expressivity of Markov Reward Open
Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstr…