Timon Willi
YOU?
Author Swipe
View article: The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind
The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind Open
As Large Language Models (LLMs) gain agentic abilities, they will have to navigate complex multi-agent scenarios, interacting with human users and other agents in cooperative and competitive settings. This will require new reasoning skills…
View article: Mixtures of Experts Unlock Parameter Scaling for Deep RL
Mixtures of Experts Unlock Parameter Scaling for Deep RL Open
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning …
View article: Analysing the Sample Complexity of Opponent Shaping
Analysing the Sample Complexity of Opponent Shaping Open
Learning in general-sum games often yields collectively sub-optimal results. Addressing this, opponent shaping (OS) methods actively guide the learning processes of other agents, empirically leading to improved individual and group perform…
View article: The Danger Of Arrogance: Welfare Equilibra As A Solution To Stackelberg Self-Play In Non-Coincidental Games
The Danger Of Arrogance: Welfare Equilibra As A Solution To Stackelberg Self-Play In Non-Coincidental Games Open
The increasing prevalence of multi-agent learning systems in society necessitates understanding how to learn effective and safe policies in general-sum multi-agent environments against a variety of opponents, including self-play. General-s…
View article: Scaling Opponent Shaping to High Dimensional Games
Scaling Opponent Shaping to High Dimensional Games Open
In multi-agent settings with mixed incentives, methods developed for zero-sum games have been shown to lead to detrimental outcomes. To address this issue, opponent shaping (OS) methods explicitly learn to influence the learning dynamics o…
View article: Leading the Pack: N-player Opponent Shaping
Leading the Pack: N-player Opponent Shaping Open
Reinforcement learning solutions have great success in the 2-player general sum setting. In this setting, the paradigm of Opponent Shaping (OS), in which agents account for the learning of their co-players, has led to agents which are able…
View article: JaxMARL: Multi-Agent RL Environments and Algorithms in JAX
JaxMARL: Multi-Agent RL Environments and Algorithms in JAX Open
Benchmarks are crucial in the development of machine learning algorithms, with available environments significantly influencing reinforcement learning (RL) research. Traditionally, RL environments run on the CPU, which limits their scalabi…
View article: Adversarial Cheap Talk
Adversarial Cheap Talk Open
Adversarial attacks in reinforcement learning (RL) often assume highly-privileged access to the victim's parameters, environment, or data. Instead, this paper proposes a novel adversarial setting called a Cheap Talk MDP in which an Adversa…
View article: Model-Free Opponent Shaping
Model-Free Opponent Shaping Open
In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD). To overcome this, some methods, such as Learning w…
View article: COLA: Consistent Learning with Opponent-Learning Awareness
COLA: Consistent Learning with Opponent-Learning Awareness Open
Learning in general-sum games is unstable and frequently leads to socially undesirable (Pareto-dominated) outcomes. To mitigate this, Learning with Opponent-Learning Awareness (LOLA) introduced opponent shaping to this setting, by accounti…
View article: Recurrent Neural Processes
Recurrent Neural Processes Open
We extend Neural Processes (NPs) to sequential data through Recurrent NPs or RNPs, a family of conditional state space models. RNPs model the state space with Neural Processes. Given time series observed on fast real-world time scales but …