Raphaël Avalos
YOU?
Author Swipe
Deep SPI: Safe Policy Improvement via World Models Open
Safe policy improvement (SPI) offers theoretical control over policy updates, yet existing guarantees largely concern offline, tabular reinforcement learning (RL). We study SPI in general online settings, when combined with world model and…
Inclusive Fitness as a Key Step Towards More Advanced Social Behaviors in Multi-Agent Reinforcement Learning Settings Open
The competitive and cooperative forces of natural selection have driven the evolution of intelligence for millions of years, culminating in nature's vast biodiversity and the complexity of human minds. Inspired by this process, we propose …
View article: Online Planning in POMDPs with State-Requests
Online Planning in POMDPs with State-Requests Open
In key real-world problems, full state information is sometimes available but only at a high cost, like activating precise yet energy-intensive sensors or consulting humans, thereby compelling the agent to operate under partial observabili…
Laser Learning Environment: A new environment for coordination-critical multi-agent tasks Open
We introduce the Laser Learning Environment (LLE), a collaborative multi-agent reinforcement learning environment in which coordination is central. In LLE, agents depend on each other to make progress (interdependence), must jointly take s…
Dynamic Size Message Scheduling for Multi-Agent Communication under Limited Bandwidth Open
Communication plays a vital role in multi-agent systems, fostering collaboration and coordination. However, in real-world scenarios where communication is bandwidth-limited, existing multi-agent reinforcement learning (MARL) algorithms oft…
View article: The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models
The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models Open
Partially Observable Markov Decision Processes (POMDPs) are used to model environments where the full state cannot be perceived by an agent. As such the agent needs to reason taking into account the past observations and actions. However, …
View article: Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning
Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning Open
Many recent successful off-policy multi-agent reinforcement learning (MARL) algorithms for cooperative partially observable environments focus on finding factorized value functions, leading to convoluted network structures. Building on the…