Explanipedia

Composing Efficient, Robust Tests for Policy Selection Open

Dustin Morrill, Thomas J. Walsh, Daniel E. Hernández, Peter R. Wurman, Peter Stone · 2023

Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmenta…

Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration Open

Dustin Morrill, Esra'a Saleh, Michael Bowling, Amy Greenwald · 2022

Neural replicator dynamics (NeuRD) is an alternative to the foundational softmax policy gradient (SPG) algorithm motivated by online learning and evolutionary game theory. The NeuRD expected update is designed to be nearly identical to tha…

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections Open

Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling , et al. · 2022

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents …

The Partially Observable History Process Open

Dustin Morrill, Amy Greenwald, Michael Bowling · 2021

We introduce the partially observable history process (POHP) formalism for reinforcement learning. POHP centers around the actions and observations of a single agent and abstracts away the presence of other players without reducing them to…

Learning to Be Cautious Open

Montaser Mohammedalamen, Dustin Morrill, Alexander Sieusahai, Yash Satsangi, Michael Bowling · 2021

A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations. It is generally impossible to anticipate all situations that an autonomous system may face or what behavior would best …

Hindsight and Sequential Rationality of Correlated Play Open

Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright , et al. · 2021

Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effec…

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games Open

Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling , et al. · 2021

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents …

Efficient Deviation Types and Learning for Hindsight Rationality in\n Extensive-Form Games Open

Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling , et al. · 2021

Hindsight rationality is an approach to playing general-sum games that\nprescribes no-regret learning dynamics for individual agents with respect to a\nset of deviations, and further describes jointly rational behavior among\nmultiple agen…

Hindsight and Sequential Rationality of Correlated Play Open

Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright , et al. · 2020

Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effec…

The Advantage Regret-Matching Actor-Critic Open

Audrūnas Gruslys, Marc Lanctot, Rémi Munos, Finbarr Timbers, Martin Schmid , et al. · 2020

Regret minimization has played a key role in online learning, equilibrium computation in games, and reinforcement learning (RL). In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsidera…

Alternative Function Approximation Parameterizations for Solving Games: An Analysis of $f$-Regression Counterfactual Regret Minimization Open

Ryan D'Orazio, Dustin Morrill, James R. Wright, Michael Bowling · 2019

Function approximation is a powerful approach for structuring large decision problems that has facilitated great achievements in the areas of reinforcement learning and game playing. Regression counterfactual regret minimization (RCFR) is …

Bounds for Approximate Regret-Matching Algorithms Open

Ryan D'Orazio, Dustin Morrill, James R. Wright · 2019

A dominant approach to solving large imperfect-information games is Counterfactural Regret Minimization (CFR). In CFR, many regret minimization problems are combined to solve the game. For very large games, abstraction is typically needed …

OpenSpiel: A Framework for Reinforcement Learning in Games Open

Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinícius Zambaldi, Satyaki Upadhyay , et al. · 2019

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot an…

Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent Open

Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill , et al. · 2019

In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents. We prov…

Neural Replicator Dynamics Open

Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Rémi Munos, Julien Pérolat , et al. · 2019

Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning. Using these algorithms in multiagent environments poses problems such as nonstationarity and instability. …

Computing Approximate Equilibria in Sequential Adversarial Games by\n Exploitability Descent Open

Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill , et al. · 2019

In this paper, we present exploitability descent, a new algorithm to compute\napproximate equilibria in two-player zero-sum extensive-form games with\nimperfect information, by direct policy optimization against worst-case\nopponents. We p…

DeepStack: Expert-level artificial intelligence in heads-up no-limit poker Open

Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill , et al. · 2017

Computer code based on continual problem re-solving beats human professional poker players at a two-player variant of poker.

Using Regret Estimation to Solve Games Compactly Open

Dustin Morrill · 2016

Game theoretic solution concepts, such as Nash equilibrium strategies that are optimal against worst case opponents, provide guidance in finding desirable autonomous agent behaviour. In particular, we wish to approximate solutions to compl…

Solving Games with Functional Regret Estimation Open

Kevin Waugh, Dustin Morrill, J. Andrew Bagnell, Michael Bowling · 2015

We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these est…

Dustin Morrill YOU? Author Swipe