Ian Osband
YOU?
Author Swipe
View article: OpenAI o1 System Card
OpenAI o1 System Card Open
The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our mo…
View article: Approximate Thompson Sampling via Epistemic Neural Networks
Approximate Thompson Sampling via Epistemic Neural Networks Open
Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling from a posterior distribution. Unfortunately, this can become computationally intractable in complex environments, such as those modeled using neu…
View article: Fine-Tuning Language Models via Epistemic Neural Networks
Fine-Tuning Language Models via Epistemic Neural Networks Open
Language models often pre-train on large unsupervised text corpora, then fine-tune on additional task-specific data. However, typical fine-tuning schemes do not prioritize the examples that they tune on. We show that, if you can prioritize…
View article: Robustness of Epinets against Distributional Shifts
Robustness of Epinets against Distributional Shifts Open
Recent work introduced the epinet as a new approach to uncertainty modeling in deep learning. An epinet is a small neural network added to traditional neural networks, which, together, can produce predictive distributions. In particular, u…
View article: Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping
Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping Open
In machine learning, an agent needs to estimate uncertainty to efficiently explore and adapt and to make effective decisions. A common approach to uncertainty estimation maintains an ensemble of models. In recent years, several approaches …
View article: Evaluating High-Order Predictive Distributions in Deep Learning
Evaluating High-Order Predictive Distributions in Deep Learning Open
Most work on supervised learning research has focused on marginal predictions. In decision problems, joint predictive distributions are essential for good performance. Previous work has developed methods for assessing low-order predictive …
View article: The Neural Testbed: Evaluating Joint Predictions
The Neural Testbed: Evaluating Joint Predictions Open
Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open-source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, th…
View article: Evaluating Predictive Distributions: Does Bayesian Deep Learning Work?
Evaluating Predictive Distributions: Does Bayesian Deep Learning Work? Open
Posterior predictive distributions quantify uncertainties ignored by point estimates. This paper introduces \textit{The Neural Testbed}, which provides tools for the systematic evaluation of agents that generate such predictions. Crucially…
View article: Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal Predictions.
Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal Predictions. Open
A fundamental challenge for any intelligent system is prediction: given some inputs $X_1,..,X_\tau$ can you predict outcomes $Y_1,.., Y_\tau$. The KL divergence $\mathbf{d}_{\mathrm{KL}}$ provides a natural measure of prediction quality, b…
View article: From Predictions to Decisions: The Importance of Joint Predictive Distributions
From Predictions to Decisions: The Importance of Joint Predictive Distributions Open
A fundamental challenge for any intelligent system is prediction: given some inputs, can you predict corresponding outcomes? Most work on supervised learning has focused on producing accurate marginal predictions for each input. However, w…
View article: Epistemic Neural Networks
Epistemic Neural Networks Open
Intelligence relies on an agent's knowledge of what it does not know. This capability can be assessed based on the quality of joint predictions of labels across multiple inputs. In principle, ensemble-based approaches produce effective joi…
View article: Reinforcement Learning, Bit by Bit
Reinforcement Learning, Bit by Bit Open
Reinforcement learning agents have demonstrated remarkable achievements in simulated environments. Data efficiency poses an impediment to carrying this success over to real environments. The design of data-efficient agents calls for a deep…
View article: Hypermodels for Exploration
Hypermodels for Exploration Open
We study the use of hypermodels to represent epistemic uncertainty and guide exploration. This generalizes and extends the use of ensembles to approximate Thompson sampling. The computational cost of training an ensemble grows with its siz…
View article: Stochastic matrix games with bandit feedback.
Stochastic matrix games with bandit feedback. Open
We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff. This generalizes the usual matrix game, where the payoff matrix…
View article: Matrix games with bandit feedback
Matrix games with bandit feedback Open
We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff. This generalizes the usual matrix game, where the payoff matrix…
View article: Making Sense of Reinforcement Learning and Probabilistic Inference
Making Sense of Reinforcement Learning and Probabilistic Inference Open
Reinforcement learning (RL) combines a control problem with statistical estimation: The system dynamics are not known to the agent, but can be learned through experience. A recent line of research casts `RL as inference' and suggests a par…
View article: Behaviour Suite for Reinforcement Learning
Behaviour Suite for Reinforcement Learning Open
This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objective…
View article: Meta-learning of Sequential Strategies
Meta-learning of Sequential Strategies Open
In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundati…
View article: Randomized Prior Functions for Deep Reinforcement Learning
Randomized Prior Functions for Deep Reinforcement Learning Open
Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most popular approaches are poorly-suited to sequent…
View article: Scalable Coordinated Exploration in Concurrent Reinforcement Learning
Scalable Coordinated Exploration in Concurrent Reinforcement Learning Open
We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on…
View article: Scalable Coordinated Exploration in Concurrent Reinforcement Learning
Scalable Coordinated Exploration in Concurrent Reinforcement Learning Open
We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on…
View article: Deep Q-learning From Demonstrations
Deep Q-learning From Demonstrations Open
Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their …
View article: Noisy Networks For Exploration
Noisy Networks For Exploration Open
We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration. The parameters of the noise are …
View article: The Uncertainty Bellman Equation and Exploration
The Uncertainty Bellman Equation and Exploration Open
We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we c…
View article: A Tutorial on Thompson Sampling
A Tutorial on Thompson Sampling Open
Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new informatio…
View article: Noisy Networks for Exploration
Noisy Networks for Exploration Open
We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration. The parameters of the noise are …
View article: On Optimistic versus Randomized Exploration in Reinforcement Learning
On Optimistic versus Randomized Exploration in Reinforcement Learning Open
We discuss the relative merits of optimistic and randomized approaches to exploration in reinforcement learning. Optimistic approaches presented in the literature apply an optimistic boost to the value estimate at each state-action pair an…
View article: Deep Q-learning from Demonstrations
Deep Q-learning from Demonstrations Open
Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their …
View article: Deep Exploration via Randomized Value Functions
Deep Exploration via Randomized Value Functions Open
We study the use of randomized value functions to guide deep exploration in reinforcement learning. This offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to v…
View article: Minimax Regret Bounds for Reinforcement Learning
Minimax Regret Bounds for Reinforcement Learning Open
We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of $\tilde{O}( \sqrt{HSAT} + H^2S^2A+H\sqrt{T})$ …