Matthieu Geist
YOU?
Author Swipe
View article: Rate optimal learning of equilibria from data
Rate optimal learning of equilibria from data Open
We close open theoretical gaps in Multi-Agent Imitation Learning (MAIL) by characterizing the limits of non-interactive MAIL and presenting the first interactive algorithm with near-optimal sample complexity. In the non-interactive setting…
View article: Space Robotics Bench: Robot Learning Beyond Earth
Space Robotics Bench: Robot Learning Beyond Earth Open
The growing ambition for space exploration demands robust autonomous systems that can operate in unstructured environments under extreme extraterrestrial conditions. The adoption of robot learning in this domain is severely hindered by the…
View article: Learning Tool-Aware Adaptive Compliant Control for Autonomous Regolith Excavation
Learning Tool-Aware Adaptive Compliant Control for Autonomous Regolith Excavation Open
Autonomous regolith excavation is a cornerstone of in-situ resource utilization for a sustained human presence beyond Earth. However, this task is fundamentally hindered by the complex interaction dynamics of granular media and the operati…
View article: Population-aware Online Mirror Descent for Mean-Field Games with Common Noise by Deep Reinforcement Learning
Population-aware Online Mirror Descent for Mean-Field Games with Common Noise by Deep Reinforcement Learning Open
Mean Field Games (MFGs) offer a powerful framework for studying large-scale multi-agent systems. Yet, learning Nash equilibria in MFGs remains a challenging problem, particularly when the initial distribution is unknown or when the populat…
View article: Convergence of regularized agent-state-based Q-learning in POMDPs
Convergence of regularized agent-state-based Q-learning in POMDPs Open
In this paper, we present a framework to understand the convergence of commonly used Q-learning reinforcement learning algorithms in practice. Two salient features of such algorithms are: (i)~the Q-table is recursively updated using an age…
View article: RoboRAN: A Unified Robotics Framework for Reinforcement Learning-Based Autonomous Navigation
RoboRAN: A Unified Robotics Framework for Reinforcement Learning-Based Autonomous Navigation Open
Autonomous robots must navigate and operate in diverse environments, from terrestrial and aquatic settings to aerial and space domains. While Reinforcement Learning (RL) has shown promise in training policies for specific autonomous robots…
View article: ShiQ: Bringing back Bellman to LLMs
ShiQ: Bringing back Bellman to LLMs Open
The fine-tuning of pre-trained large language models (LLMs) using reinforcement learning (RL) is generally formulated as direct policy optimization. This approach was naturally favored as it efficiently improves a pretrained LLM, seen as a…
View article: Command A: An Enterprise-Ready Large Language Model
Command A: An Enterprise-Ready Large Language Model Open
In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languag…
View article: Understanding Likelihood Over-optimisation in Direct Alignment Algorithms
Understanding Likelihood Over-optimisation in Direct Alignment Algorithms Open
Direct Alignment Algorithms (DAAs), such as Direct Preference Optimisation (DPO) and Identity Preference Optimisation (IPO), have emerged as alternatives to online Reinforcement Learning from Human Feedback (RLHF) algorithms such as Proxim…
View article: DRIFT: Deep Reinforcement Learning for Intelligent Floating Platforms Trajectories
DRIFT: Deep Reinforcement Learning for Intelligent Floating Platforms Trajectories Open
peer reviewed
View article: Solving robust MDPs as a sequence of static RL problems
Solving robust MDPs as a sequence of static RL problems Open
Designing control policies whose performance level is guaranteed to remain above a given threshold in a span of environments is a critical feature for the adoption of reinforcement learning (RL) in real-world applications. The search for s…
View article: Imitating Language via Scalable Inverse Reinforcement Learning
Imitating Language via Scalable Inverse Reinforcement Learning Open
The majority of language model training builds on imitation learning. It covers pretraining, supervised fine-tuning, and affects the starting conditions for reinforcement learning from human feedback (RLHF). The simplicity and scalability …
View article: Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion Open
Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often si…
View article: Averaging log-likelihoods in direct alignment
Averaging log-likelihoods in direct alignment Open
To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to lea…
View article: Leveraging Procedural Generation for Learning Autonomous Peg-in-Hole Assembly in Space
Leveraging Procedural Generation for Learning Autonomous Peg-in-Hole Assembly in Space Open
peer reviewed
View article: Time-Constrained Robust MDPs
Time-Constrained Robust MDPs Open
Robust reinforcement learning is essential for deploying reinforcement learning algorithms in real-world scenarios where environmental uncertainty predominates. Traditional robust reinforcement learning often depends on rectangularity assu…
View article: RRLS : Robust Reinforcement Learning Suite
RRLS : Robust Reinforcement Learning Suite Open
Robust reinforcement learning is the problem of learning control policies that provide optimal worst-case performance against a span of adversarial environments. It is a crucial ingredient for deploying algorithms in real-world scenarios w…
View article: Bootstrapping Expectiles in Reinforcement Learning
Bootstrapping Expectiles in Reinforcement Learning Open
Many classic Reinforcement Learning (RL) algorithms rely on a Bellman operator, which involves an expectation over the next states, leading to the concept of bootstrapping. To introduce a form of pessimism, we propose to replace this expec…
View article: Self-Improving Robust Preference Optimization
Self-Improving Robust Preference Optimization Open
Online and offline RLHF methods, such as PPO and DPO, have been highly successful in aligning AI with human preferences. Despite their success, however, these methods suffer from fundamental limitations: (a) Models trained with RLHF can le…
View article: Leveraging Procedural Generation for Learning Autonomous Peg-in-Hole Assembly in Space
Leveraging Procedural Generation for Learning Autonomous Peg-in-Hole Assembly in Space Open
The ability to autonomously assemble structures is crucial for the development of future space infrastructure. However, the unpredictable conditions of space pose significant challenges for robotic systems, necessitating the development of…
View article: Learning Discrete-Time Major-Minor Mean Field Games
Learning Discrete-Time Major-Minor Mean Field Games Open
Recent techniques based on Mean Field Games (MFGs) allow the scalable analysis of multi-player games with many similar, rational agents. However, standard MFGs remain limited to homogeneous players that weakly influence each other, and can…
View article: Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning
Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning Open
Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task. In this paper, we propose a deep reinforcement learning (DRL) algorithm that achieves popu…
View article: MusicRL: Aligning Music Generation to Human Preferences
MusicRL: Aligning Music Generation to Human Preferences Open
We propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are use…
View article: Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View
Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View Open
Some reinforcement learning (RL) algorithms can stitch pieces of experience to solve a task never seen before during training. This oft-sought property is one of the few ways in which RL methods based on dynamic-programming differ from RL …
View article: Learning Discrete-Time Major-Minor Mean Field Games
Learning Discrete-Time Major-Minor Mean Field Games Open
Recent techniques based on Mean Field Games (MFGs) allow the scalable analysis of multi-player games with many similar, rational agents. However, standard MFGs remain limited to homogeneous players that weakly influence each other, and can…
View article: A Survey of Temporal Credit Assignment in Deep Reinforcement Learning
A Survey of Temporal Credit Assignment in Deep Reinforcement Learning Open
The Credit Assignment Problem (CAP) refers to the longstanding challenge of Reinforcement Learning (RL) agents to associate actions with their long-term consequences. Solving the CAP is a crucial step towards the successful deployment of R…
View article: Nash Learning from Human Feedback
Nash Learning from Human Feedback Open
Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human preferences. Typically, RLHF involves the initial step of learning a reward model from human feedback, …
View article: DRIFT: Deep Reinforcement Learning for Intelligent Floating Platforms Trajectories
DRIFT: Deep Reinforcement Learning for Intelligent Floating Platforms Trajectories Open
This investigation introduces a novel deep reinforcement learning-based suite to control floating platforms in both simulated and real-world environments. Floating platforms serve as versatile test-beds to emulate micro-gravity environment…
View article: Offline Reinforcement Learning with On-Policy Q-Function Regularization
Offline Reinforcement Learning with On-Policy Q-Function Regularization Open
The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior wor…