Explanipedia

OpenAI o1 System Card Open

OpenAI, NULL AUTHOR_ID, Aaron Jaech, Adam Tauman Kalai, Adam Lerer , et al. · 2024

The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our mo…

Approximate Thompson Sampling via Epistemic Neural Networks Open

Ian Osband, Wen Zheng, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi , et al. · 2023

Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling from a posterior distribution. Unfortunately, this can become computationally intractable in complex environments, such as those modeled using neu…

Fine-Tuning Language Models via Epistemic Neural Networks Open

Ian Osband, Seyed Mohammad Asghari, Benjamin Van Roy, Nat McAleese, John Aslanides , et al. · 2022

Language models often pre-train on large unsupervised text corpora, then fine-tune on additional task-specific data. However, typical fine-tuning schemes do not prioritize the examples that they tune on. We show that, if you can prioritize…

Robustness of Epinets against Distributional Shifts Open

Xiuyuan Lu, Ian Osband, Seyed Mohammad Asghari, Sven Gowal, Vikranth Dwaracherla , et al. · 2022

Recent work introduced the epinet as a new approach to uncertainty modeling in deep learning. An epinet is a small neural network added to traditional neural networks, which, together, can produce predictive distributions. In particular, u…

Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping Open

Vikranth Dwaracherla, Wen Zheng, Ian Osband, Xiuyuan Lu, Seyed Mohammad Asghari , et al. · 2022

In machine learning, an agent needs to estimate uncertainty to efficiently explore and adapt and to make effective decisions. A common approach to uncertainty estimation maintains an ensemble of models. In recent years, several approaches …

Evaluating High-Order Predictive Distributions in Deep Learning Open

Ian Osband, Wen Zheng, Seyed Mohammad Asghari, Vikranth Dwaracherla, Xiuyuan Lu , et al. · 2022

Most work on supervised learning research has focused on marginal predictions. In decision problems, joint predictive distributions are essential for good performance. Previous work has developed methods for assessing low-order predictive …

The Neural Testbed: Evaluating Joint Predictions Open

Ian Osband, Wen Zheng, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao , et al. · 2021

Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open-source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, th…

Evaluating Predictive Distributions: Does Bayesian Deep Learning Work? Open

Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao , et al. · 2021

Posterior predictive distributions quantify uncertainties ignored by point estimates. This paper introduces \textit{The Neural Testbed}, which provides tools for the systematic evaluation of agents that generate such predictions. Crucially…

Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal Predictions. Open

Xiuyuan Lu, Ian Osband, Benjamin Van Roy, Zheng Wen · 2021

A fundamental challenge for any intelligent system is prediction: given some inputs $X_1,..,X_\tau$ can you predict outcomes $Y_1,.., Y_\tau$. The KL divergence $\mathbf{d}_{\mathrm{KL}}$ provides a natural measure of prediction quality, b…

From Predictions to Decisions: The Importance of Joint Predictive Distributions Open

Wen Zheng, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi , et al. · 2021

A fundamental challenge for any intelligent system is prediction: given some inputs, can you predict corresponding outcomes? Most work on supervised learning has focused on producing accurate marginal predictions for each input. However, w…

Epistemic Neural Networks Open

Ian Osband, Zheng Wen, Mohammad H. Asghari, Morteza Ibrahimi, Xiyuan Lu , et al. · 2021

Intelligence relies on an agent's knowledge of what it does not know. This capability can be assessed based on the quality of joint predictions of labels across multiple inputs. In principle, ensemble-based approaches produce effective joi…

Reinforcement Learning, Bit by Bit Open

Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband , et al. · 2021

Reinforcement learning agents have demonstrated remarkable achievements in simulated environments. Data efficiency poses an impediment to carrying this success over to real environments. The design of data-efficient agents calls for a deep…

Hypermodels for Exploration Open

Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Zheng Wen , et al. · 2020

We study the use of hypermodels to represent epistemic uncertainty and guide exploration. This generalizes and extends the use of ensembles to approximate Thompson sampling. The computational cost of training an ensemble grows with its siz…

Stochastic matrix games with bandit feedback. Open

Brendan O’Donoghue, Tor Lattimore, Ian Osband · 2020

We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff. This generalizes the usual matrix game, where the payoff matrix…

Matrix games with bandit feedback Open

Brendan O’Donoghue, Tor Lattimore, Ian Osband · 2020

We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff. This generalizes the usual matrix game, where the payoff matrix…

Making Sense of Reinforcement Learning and Probabilistic Inference Open

Brendan O’Donoghue, Ian Osband, Catalin Ionescu · 2020

Reinforcement learning (RL) combines a control problem with statistical estimation: The system dynamics are not known to the agent, but can be learned through experience. A recent line of research casts `RL as inference' and suggests a par…

Behaviour Suite for Reinforcement Learning Open

Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener , et al. · 2019

This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objective…

Meta-learning of Sequential Strategies Open

Pedro A. Ortega, Jane X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth‐Nelson , et al. · 2019

In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundati…

Randomized Prior Functions for Deep Reinforcement Learning Open

Ian Osband, John Aslanides, Albin Cassirer · 2018

Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most popular approaches are poorly-suited to sequent…

Scalable Coordinated Exploration in Concurrent Reinforcement Learning Open

Maria Dimakopoulou, Ian Osband, Benjamin Van Roy · 2018

We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on…

Scalable Coordinated Exploration in Concurrent Reinforcement Learning Open

Maria Dimakopoulou, Ian Osband, Benjamin Van Roy · 2018

We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on…

Deep Q-learning From Demonstrations Open

Todd Hester, Matej Vecerík, Olivier Pietquin, Marc Lanctot, Tom Schaul , et al. · 2018

Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their …

Noisy Networks For Exploration Open

Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband , et al. · 2018

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration. The parameters of the noise are …

The Uncertainty Bellman Equation and Exploration Open

Brendan O’Donoghue, Ian Osband, Rémi Munos, Volodymyr Mnih · 2017

We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we c…

A Tutorial on Thompson Sampling Open

Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen · 2017

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new informatio…

Noisy Networks for Exploration Open

Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband , et al. · 2017

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration. The parameters of the noise are …

On Optimistic versus Randomized Exploration in Reinforcement Learning Open

Ian Osband, Benjamin Van Roy · 2017

We discuss the relative merits of optimistic and randomized approaches to exploration in reinforcement learning. Optimistic approaches presented in the literature apply an optimistic boost to the value estimate at each state-action pair an…

Deep Q-learning from Demonstrations Open

Todd Hester, Matej Vecerík, Olivier Pietquin, Marc Lanctot, Tom Schaul , et al. · 2017

Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their …

Deep Exploration via Randomized Value Functions Open

Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen · 2017

We study the use of randomized value functions to guide deep exploration in reinforcement learning. This offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to v…

Minimax Regret Bounds for Reinforcement Learning Open

Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos · 2017

We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of $\tilde{O}( \sqrt{HSAT} + H^2S^2A+H\sqrt{T})$ …

Ian Osband YOU? Author Swipe