Explanipedia

Rate optimal learning of equilibria from data Open

Till Freihaut, Luca Viano, Emanuele Nevali, Volkan Cevher, Matthieu Geist , et al. · 2025

We close open theoretical gaps in Multi-Agent Imitation Learning (MAIL) by characterizing the limits of non-interactive MAIL and presenting the first interactive algorithm with near-optimal sample complexity. In the non-interactive setting…

Space Robotics Bench: Robot Learning Beyond Earth Open

Andrej Orsula, Matthieu Geist, Miguel Olivares-Mendez, Carol Martínez · 2025

The growing ambition for space exploration demands robust autonomous systems that can operate in unstructured environments under extreme extraterrestrial conditions. The adoption of robot learning in this domain is severely hindered by the…

Learning Tool-Aware Adaptive Compliant Control for Autonomous Regolith Excavation Open

Andrej Orsula, Matthieu Geist, Miguel Olivares-Mendez, Carol Martínez · 2025

Autonomous regolith excavation is a cornerstone of in-situ resource utilization for a sustained human presence beyond Earth. However, this task is fundamentally hindered by the complex interaction dynamics of granular media and the operati…

Population-aware Online Mirror Descent for Mean-Field Games with Common Noise by Deep Reinforcement Learning Open

Zida Wu, Mathieu Laurière, Matthieu Geist, Olivier Pietquin, Ankur Mehta · 2025

Mean Field Games (MFGs) offer a powerful framework for studying large-scale multi-agent systems. Yet, learning Nash equilibria in MFGs remains a challenging problem, particularly when the initial distribution is unknown or when the populat…

Convergence of regularized agent-state-based Q-learning in POMDPs Open

Amit Kumar Sinha, Matthieu Geist, Aditya Mahajan · 2025

In this paper, we present a framework to understand the convergence of commonly used Q-learning reinforcement learning algorithms in practice. Two salient features of such algorithms are: (i)~the Q-table is recursively updated using an age…

RoboRAN: A Unified Robotics Framework for Reinforcement Learning-Based Autonomous Navigation Open

Matteo El-Hariry, Antoine Richard, Ricard M. Castan, Luis Felipe Wolf Batista, Matthieu Geist , et al. · 2025

Autonomous robots must navigate and operate in diverse environments, from terrestrial and aquatic settings to aerial and space domains. While Reinforcement Learning (RL) has shown promise in training policies for specific autonomous robots…

ShiQ: Bringing back Bellman to LLMs Open

Pierre Clavier, Nathan Grinsztajn, Raphaël Avalos, Yannis Flet-Berliac, Irem Ergun , et al. · 2025

The fine-tuning of pre-trained large language models (LLMs) using reinforcement learning (RL) is generally formulated as direct policy optimization. This approach was naturally favored as it efficiently improves a pretrained LLM, seen as a…

Command A: An Enterprise-Ready Large Language Model Open

Team Cohere, :, Aakanksha, Arash Ahmadian, Marwan Ahmed , et al. · 2025

In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languag…

Understanding Likelihood Over-optimisation in Direct Alignment Algorithms Open

Zhengyan Darius Shi, Sander Land, A. Locatelli, Matthieu Geist, Max Bartolo · 2024

Computer science

Direct Alignment Algorithms (DAAs), such as Direct Preference Optimisation (DPO) and Identity Preference Optimisation (IPO), have emerged as alternatives to online Reinforcement Learning from Human Feedback (RLHF) algorithms such as Proxim…

DRIFT: Deep Reinforcement Learning for Intelligent Floating Platforms Trajectories Open

Matteo El-Hariry, Antoine Richard, Vivek Muralidharan, Matthieu Geist, Miguel Olivares-Mendez · 2024

Computer science

peer reviewed

Solving robust MDPs as a sequence of static RL problems Open

Adil Zouitine, Matthieu Geist, Emmanuel Rachelson · 2024

Computer science Biology

Designing control policies whose performance level is guaranteed to remain above a given threshold in a span of environments is a critical feature for the adoption of reinforcement learning (RL) in real-world applications. The search for s…

Imitating Language via Scalable Inverse Reinforcement Learning Open

Markus Wulfmeier, Michael Bloesch, Nino Vieillard, Arun Ahuja, Jörg Bornschein , et al. · 2024

Computer science Psychology Philosophy

The majority of language model training builds on imitation learning. It covers pretraining, supervised fine-tuning, and affects the starting conditions for reinforcement learning from human feedback (RLHF). The simplicity and scalability …

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion Open

Yannis Flet-Berliac, Nathan Grinsztajn, Florian Strub, Eugene Choi, Chris Cremer , et al. · 2024

Business Computer science Biology

Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often si…

Averaging log-likelihoods in direct alignment Open

Nathan Grinsztajn, Yannis Flet-Berliac, Mohammad Gheshlaghi Azar, Florian Strub, Bill X. Wu , et al. · 2024

Computer science Mathematics

To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to lea…

Leveraging Procedural Generation for Learning Autonomous Peg-in-Hole Assembly in Space Open

Andrej Orsula, Matthieu Geist, Miguel Olivares-Mendez, Carol Martínez · 2024

Computer science Business

peer reviewed

Time-Constrained Robust MDPs Open

Adil Zouitine, David Bertoin, Pierre Clavier, Matthieu Geist, Emmanuel Rachelson · 2024

Computer science

Robust reinforcement learning is essential for deploying reinforcement learning algorithms in real-world scenarios where environmental uncertainty predominates. Traditional robust reinforcement learning often depends on rectangularity assu…

RRLS : Robust Reinforcement Learning Suite Open

Adil Zouitine, David Bertoin, Pierre Clavier, Matthieu Geist, Emmanuel Rachelson · 2024

Computer science Psychology Geography

Robust reinforcement learning is the problem of learning control policies that provide optimal worst-case performance against a span of adversarial environments. It is a crucial ingredient for deploying algorithms in real-world scenarios w…

Bootstrapping Expectiles in Reinforcement Learning Open

Pierre Clavier, Emmanuel Rachelson, Erwan Le Pennec, Matthieu Geist · 2024

Computer science Psychology Mathematics

Many classic Reinforcement Learning (RL) algorithms rely on a Bellman operator, which involves an expectation over the next states, leading to the concept of bootstrapping. To introduce a form of pessimism, we propose to replace this expec…

Self-Improving Robust Preference Optimization Open

Eugene Choi, Arash Ahmadian, Matthieu Geist, Oilvier Pietquin, Mohammad Gheshlaghi Azar · 2024

Computer science Mathematics

Online and offline RLHF methods, such as PPO and DPO, have been highly successful in aligning AI with human preferences. Despite their success, however, these methods suffer from fundamental limitations: (a) Models trained with RLHF can le…

Leveraging Procedural Generation for Learning Autonomous Peg-in-Hole Assembly in Space Open

Andrej Orsula, Matthieu Geist, Miguel Olivares-Mendez, Carol Martínez · 2024

Computer science Psychology Business

The ability to autonomously assemble structures is crucial for the development of future space infrastructure. However, the unpredictable conditions of space pose significant challenges for robotic systems, necessitating the development of…

Learning Discrete-Time Major-Minor Mean Field Games Open

Kai Cui, Gökçe Dayanıklı, Mathieu Laurière, Matthieu Geist, Olivier Pietquin , et al. · 2024

Computer science Mathematics Art

Recent techniques based on Mean Field Games (MFGs) allow the scalable analysis of multi-player games with many similar, rational agents. However, standard MFGs remain limited to homogeneous players that weakly influence each other, and can…

Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning Open

Zida Wu, Mathieu Laurière, Samuel Jia Cong Chua, Matthieu Geist, Olivier Pietquin , et al. · 2024

Computer science Psychology Geography

Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task. In this paper, we propose a deep reinforcement learning (DRL) algorithm that achieves popu…

MusicRL: Aligning Music Generation to Human Preferences Open

Geoffrey Cideron, Sertan Girgin, Mauro Verzetti, Damien Vincent, Matej Kastelic , et al. · 2024

Business

We propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are use…

Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View Open

Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach · 2024

Computer science Mathematics Philosophy

Some reinforcement learning (RL) algorithms can stitch pieces of experience to solve a task never seen before during training. This oft-sought property is one of the few ways in which RL methods based on dynamic-programming differ from RL …

Learning Discrete-Time Major-Minor Mean Field Games Open

Kai Cui, Gökçe Dayanıklı, Mathieu Laurière, Matthieu Geist, Olivier Pietquin , et al. · 2023

Computer science Mathematics Economics

Recent techniques based on Mean Field Games (MFGs) allow the scalable analysis of multi-player games with many similar, rational agents. However, standard MFGs remain limited to homogeneous players that weakly influence each other, and can…

A Survey of Temporal Credit Assignment in Deep Reinforcement Learning Open

Eduardo Pignatelli, Johan Ferret, Matthieu Geist, Thomas Mesnard, Hado van Hasselt , et al. · 2023

Computer science Engineering Physics

The Credit Assignment Problem (CAP) refers to the longstanding challenge of Reinforcement Learning (RL) agents to associate actions with their long-term consequences. Solving the CAP is a crucial step towards the successful deployment of R…

Nash Learning from Human Feedback Open

Rémi Munos, Michal Vaľko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland , et al. · 2023

Computer science Mathematics Economics

Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human preferences. Typically, RLHF involves the initial step of learning a reward model from human feedback, …

DRIFT: Deep Reinforcement Learning for Intelligent Floating Platforms Trajectories Open

Matteo El-Hariry, Antoine Richard, Vivek Muralidharan, Matthieu Geist, Miguel Olivares-Mendez · 2023

Computer science Engineering

This investigation introduces a novel deep reinforcement learning-based suite to control floating platforms in both simulated and real-world environments. Floating platforms serve as versatile test-beds to emulate micro-gravity environment…

Offline Reinforcement Learning with On-Policy Q-Function Regularization Open

Laixi Shi, Robert Dadashi, Yuejie Chi, Pablo Samuel Castro, Matthieu Geist · 2023

Computer science Mathematics Engineering

The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior wor…

Matthieu Geist YOU? Author Swipe