Sam Devlin
YOU?
Author Swipe
View article: Adapting a World Model for Trajectory Following in a 3D Game
Adapting a World Model for Trajectory Following in a 3D Game Open
Imitation learning is a powerful tool for training agents by leveraging expert knowledge, and being able to replicate a given trajectory is an integral part of it. In complex environments, like modern 3D video games, distribution shift and…
View article: World and Human Action Models towards gameplay ideation
World and Human Action Models towards gameplay ideation Open
Generative artificial intelligence (AI) has the potential to transform creative industries through supporting human creative ideation-the generation of new ideas1-5. However, limitations in model capabilities raise key challenges in integr…
View article: Scaling Laws for Pre-training Agents and World Models
Scaling Laws for Pre-training Agents and World Models Open
The performance of embodied agents has been shown to improve by increasing model parameters, dataset size, and compute. This has been demonstrated in domains from robotics to video games, when generative learning objectives on offline data…
View article: Efficient Offline Reinforcement Learning: The Critic is Critical
Efficient Offline Reinforcement Learning: The Critic is Critical Open
Recent work has demonstrated both benefits and limitations from using supervised approaches (without temporal-difference learning) for offline reinforcement learning. While off-policy reinforcement learning provides a promising approach fo…
View article: Aligning Agents like Large Language Models
Aligning Agents like Large Language Models Open
Training agents to behave as desired in complex 3D environments from high-dimensional sensory information is challenging. Imitation learning from diverse human behavior provides a scalable approach for training an agent with a sensible beh…
View article: Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games
Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games Open
Video games have served as useful benchmarks for the decision-making community, but going beyond Atari games towards modern games has been prohibitively expensive for the vast majority of the research community. Prior work in modern video …
View article: Navigates Like Me: Understanding How People Evaluate Human-Like AI in Video Games
Navigates Like Me: Understanding How People Evaluate Human-Like AI in Video Games Open
We aim to understand how people assess human likeness in navigation produced\nby people and artificially intelligent (AI) agents in a video game. To this\nend, we propose a novel AI agent with the goal of generating more human-like\nbehavi…
View article: Adaptive Scaffolding in Block-Based Programming via Synthesizing New Tasks as Pop Quizzes
Adaptive Scaffolding in Block-Based Programming via Synthesizing New Tasks as Pop Quizzes Open
Block-based programming environments are increasingly used to introduce computing concepts to beginners. However, novice students often struggle in these environments, given the conceptual and open-ended nature of programming tasks. To eff…
View article: Trust-Region-Free Policy Optimization for Stochastic Policies
Trust-Region-Free Policy Optimization for Stochastic Policies Open
Trust Region Policy Optimization (TRPO) is an iterative method that simultaneously maximizes a surrogate objective and enforces a trust region constraint over consecutive policies in each iteration. The combination of the surrogate objecti…
View article: Contrastive Meta-Learning for Partially Observable Few-Shot Learning
Contrastive Meta-Learning for Partially Observable Few-Shot Learning Open
Many contrastive and meta-learning approaches learn representations by identifying common features in multiple views. However, the formalism for these approaches generally assumes features to be shared across views to be captured coherentl…
View article: Imitating Human Behaviour with Diffusion Models
Imitating Human Behaviour with Diffusion Models Open
Diffusion models have emerged as powerful generative models in the text-to-image domain. This paper studies their application as observation-to-action models for imitating human behaviour in sequential environments. Human behaviour is stoc…
View article: UniMASK: Unified Inference in Sequential Decision Problems
UniMASK: Unified Inference in Sequential Decision Problems Open
Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision-making,…
View article: Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency
Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency Open
Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-polic…
View article: Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers
Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers Open
Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making,…
View article: How Humans Perceive Human-like Behavior in Video Game Navigation
How Humans Perceive Human-like Behavior in Video Game Navigation Open
The goal of this paper is to understand how people assess human-likeness in human- and AI-generated behavior. To this end, we present a qualitative study of hundreds of crowd-sourced assessments of human-likeness of behavior in a 3D video …
View article: You May Not Need Ratio Clipping in PPO
You May Not Need Ratio Clipping in PPO Open
Proximal Policy Optimization (PPO) methods learn a policy by iteratively performing multiple mini-batch optimization epochs of a surrogate objective with one set of sampled data. Ratio clipping PPO is a popular variant that clips the proba…
View article: Trust Region Bounds for Decentralized PPO Under Non-stationarity
Trust Region Bounds for Decentralized PPO Under Non-stationarity Open
We present trust region bounds for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL), which holds even when the transition dynamics are non-stationary. This new analysis provides a theoretical under…
View article: Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency
Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency Open
Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-polic…
View article: Strategically Efficient Exploration in Competitive Multi-agent\n Reinforcement Learning
Strategically Efficient Exploration in Competitive Multi-agent\n Reinforcement Learning Open
High sample complexity remains a barrier to the application of reinforcement\nlearning (RL), particularly in multi-agent systems. A large body of work has\ndemonstrated that exploration mechanisms based on the principle of optimism\nunder …
View article: Strategically Efficient Exploration in Competitive Multi-agent Reinforcement Learning
Strategically Efficient Exploration in Competitive Multi-agent Reinforcement Learning Open
High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems. A large body of work has demonstrated that exploration mechanisms based on the principle of optimism under unc…
View article: Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation.
Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation. Open
A key challenge on the path to developing agents that learn complex human-like behavior is the need to quickly and accurately quantify human-likeness. While human assessments of such behavior can be highly accurate, speed and scalability a…
View article: Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation
Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation Open
A key challenge on the path to developing agents that learn complex human-like behavior is the need to quickly and accurately quantify human-likeness. While human assessments of such behavior can be highly accurate, speed and scalability a…
View article: Rolling Horizon Evolutionary Algorithms for General Video Game Playing
Rolling Horizon Evolutionary Algorithms for General Video Game Playing Open
Game-playing Evolutionary Algorithms, specifically Rolling Horizon Evolutionary Algorithms, have recently managed to beat the state of the art in win rate across many video games. However, the best results in a game are highly dependent on…
View article: A Comparison of Self-Play Algorithms Under a Generalized Framework
A Comparison of Self-Play Algorithms Under a Generalized Framework Open
Throughout scientific history, overarching theoretical frameworks have allowed researchers to grow beyond personal intuitions and culturally biased theories. They allow to verify and replicate existing findings, and to link is connected re…
View article: Evaluating the Robustness of Collaborative Agents
Evaluating the Robustness of Collaborative Agents Open
In order for agents trained by deep reinforcement learning to work alongside humans in realistic settings, we will need to ensure that the agents are \emph{robust}. Since the real world is very diverse, and human behavior often changes in …
View article: Evaluating the Robustness of Collaborative Agents
Evaluating the Robustness of Collaborative Agents Open
Artificial agents trained by deep reinforcement learning will likely encounter novel situations after deployment that were never seen during training. Our agent must be robust to handle such situations well. However, if we cannot rely on t…
View article: Deep Interactive Bayesian Reinforcement Learning via Meta-Learning
Deep Interactive Bayesian Reinforcement Learning via Meta-Learning Open
Agents that interact with other agents often do not know a priori what the other agents' strategies are, but have to maximise their own online return while interacting with and learning about others. The optimal adaptive behaviour under un…
View article: Difference Rewards Policy Gradients
Difference Rewards Policy Gradients Open
Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing …
View article: “It’s Unwieldy and It Takes a Lot of Time” — Challenges and Opportunities for Creating Agents in Commercial Games
“It’s Unwieldy and It Takes a Lot of Time” — Challenges and Opportunities for Creating Agents in Commercial Games Open
Game agents such as opponents, non-player characters, and teammates are central to player experiences in many modern games. As the landscape of AI techniques used in the games industry evolves to adopt machine learning (ML) more widely, it…