Explanipedia

Improving cosmological reach of a gravitational wave observatory using Deep Loop Shaping Open

Jonas Buchli, Brendan Tracey, T. Andrić, Christopher Wipf, Yu Him Justin Chiu , et al. · 2025

Improved low-frequency sensitivity of gravitational wave observatories would unlock study of intermediate-mass black hole mergers and binary black hole eccentricity and provide early warnings for multimessenger observations of binary neutr…

Exploiting Policy Idling for Dexterous Manipulation Open

A Chen, Philémon Brakel, Antonia Bronars, Annie Xie, Sandy H. Huang , et al. · 2025

Learning-based methods for dexterous manipulation have made notable progress in recent years. However, learned policies often still lack reliability and exhibit limited robustness to important factors of variation. One failure pattern that…

Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement Open

Michael Bloesch, Markus Wulfmeier, Philémon Brakel, Todor Davchev, Martina Zambelli , et al. · 2025

Imitation Learning from Observation (IfO) offers a powerful way to learn behaviors at large-scale: Unlike behavior cloning or offline reinforcement learning, IfO can leverage action-free demonstrations and thus circumvents the need for cos…

Using cognitive models to reveal value trade-offs in language models Open

Sonia K. Murthy, Rosie Zhao, Jennifer J. Hu, Sham M. Kakade, Markus Wulfmeier , et al. · 2025

Value trade-offs are an integral part of human decision-making and language use, however, current tools for interpreting such dynamic and multi-faceted notions of values in LLMs are limited. In cognitive science, so-called "cognitive model…

Aligning Large Language Models with Human Feedback: Mathematical Foundations and Algorithm Design Open

Siliang Zeng, Luca Viano, Chenliang Li, Jiaxiang Li, Volkan Cevher , et al. · 2025

LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Open

Thomas Schmied, Jörg Bornschein, Jordi Grau-Moya, Markus Wulfmeier, Razvan Pascanu · 2025

The success of Large Language Models (LLMs) has sparked interest in various agentic applications. A key hypothesis is that LLMs, leveraging common sense and Chain-of-Thought (CoT) reasoning, can effectively explore and efficiently solve co…

Imitating Language via Scalable Inverse Reinforcement Learning Open

Markus Wulfmeier, Michael Bloesch, Nino Vieillard, Arun Ahuja, Jörg Bornschein , et al. · 2024

The majority of language model training builds on imitation learning. It covers pretraining, supervised fine-tuning, and affects the starting conditions for reinforcement learning from human feedback (RLHF). The simplicity and scalability …

Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning Open

Dhruva Tirumala, Markus Wulfmeier, Ben Moran, Sandy H. Huang, Jan Humplik , et al. · 2024

We apply multi-agent deep reinforcement learning (RL) to train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision. This setting reflects many challenges of real-world robotics, including a…

Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution Open

Tim Seyde, Peter Werner, Wilko Schwarting, Markus Wulfmeier, Daniela Rus · 2024

Recent reinforcement learning approaches have shown surprisingly strong capabilities of bang-bang policies for solving continuous control benchmarks. The underlying coarse action space discretizations often yield favourable exploration cha…

Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning Open

Mohak Bhardwaj, Thomas Lampe, Michael Neunert, Francesco Romanò, Abbas Abdolmaleki , et al. · 2024

Recent advances in real-world applications of reinforcement learning (RL) have relied on the ability to accurately simulate systems at scale. However, domains such as fluid dynamical systems exhibit complex dynamic phenomena that are hard …

Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots Open

Thomas Lampe, Abbas Abdolmaleki, Sarah Bechtle, Sandy H. Huang, Jost Tobias Springenberg , et al. · 2023

Reinforcement learning solely from an agent's self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly …

Foundations for Transfer in Reinforcement Learning: A Taxonomy of Knowledge Modalities Open

Markus Wulfmeier, Arunkumar Byravan, Sarah Bechtle, Karol Hausman, Nicolas Heess · 2023

Contemporary artificial intelligence systems exhibit rapidly growing abilities accompanied by the growth of required resources, expansive datasets and corresponding investments into computing infrastructure. Although earlier successes pred…

Replay across Experiments: A Natural Extension of Off-Policy RL Open

Dhruva Tirumala, Thomas Lampe, José Enrique Chen, Tuomas Haarnoja, Sandy H. Huang , et al. · 2023

Replaying data is a principal mechanism underlying the stability and data efficiency of off-policy reinforcement learning (RL). We present an effective yet simple framework to extend the use of replays across multiple experiments, minimall…

Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning Open

Cristina Pinneri, Sarah Bechtle, Markus Wulfmeier, Arunkumar Byravan, Jingwei Zhang , et al. · 2023

We present a novel approach to address the challenge of generalization in offline reinforcement learning (RL), where the agent learns from a fixed dataset without any additional interaction with the environment. Specifically, we aim to imp…

Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World Open

Nico Gürtler, Felix Widmaier, Cansu Sancaktar, Sebastian Blaes, Pavel Kolev , et al. · 2023

Experimentation on real robots is demanding in terms of time and costs. For this reason, a large part of the reinforcement learning (RL) community uses simulators to develop and benchmark algorithms. However, insights gained in simulation …

Towards A Unified Agent with Foundation Models Open

Norman Di Palo, Arunkumar Byravan, Leonard Hasenclever, Markus Wulfmeier, Nicolas Heess , et al. · 2023

Language Models and Vision Language Models have recently demonstrated unprecedented capabilities in terms of understanding human intentions, reasoning, scene understanding, and planning-like behaviour, in text form, among many others. In t…

Massively Scalable Inverse Reinforcement Learning in Google Maps Open

Matt Barnes, Matthew Abueg, Oliver F. Lange, Matt Deeds, Jason M. Trader , et al. · 2023

Inverse reinforcement learning (IRL) offers a powerful and general framework for learning humans' latent preferences in route recommendation, yet no approach has successfully addressed planetary-scale problems with hundreds of millions of …

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning Open

Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala , et al. · 2023

We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environme…

SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration Open

Giulia Vezzani, Dhruva Tirumala, Markus Wulfmeier, Dushyant Rao, Abbas Abdolmaleki , et al. · 2022

The ability to effectively reuse prior knowledge is a key requirement when building general and flexible Reinforcement Learning (RL) agents. Skill reuse is one of the most common approaches, but current methods have considerable limitation…

Solving Continuous Control via Q-learning Open

Tim Seyde, P. Werner, Wilko Schwarting, Igor Gilitschenski, Martin Riedmiller , et al. · 2022

While there has been substantial success for solving continuous control with actor-critic methods, simpler critic-only methods such as Q-learning find limited application in the associated high-dimensional action spaces. However, most acto…

MO2: Model-Based Offline Options Open

Sasha Salter, Markus Wulfmeier, Dhruva Tirumala, Nicolas Heess, Martin Riedmiller , et al. · 2022

The ability to discover useful behaviours from past experience and transfer them to new tasks is considered a core component of natural embodied intelligence. Inspired by neuroscience, discovering behaviours that switch at bottleneck state…

Figure Data for the paper "From Motor Control to Team Play in Simulated Humanoid Football" Open

Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami , et al. · 2022

Data Release for Article: From Motor Control to Team Play in Simulated Humanoid Football This package releases a set of Python notebooks each reproducing a quantitative figure featured in the research article "Fro…

Figure Data for the paper "From Motor Control to Team Play in Simulated Humanoid Football" Open

Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami , et al. · 2022

Data Release for Article: From Motor Control to Team Play in Simulated Humanoid Football This package releases a set of Python notebooks each reproducing a quantitative figure featured in the research article "Fro…

Forgetting and Imbalance in Robot Lifelong Learning with Off-policy Data Open

Wenxuan Zhou, Steven Bohez, Jan Humplik, Abbas Abdolmaleki, Dushyant Rao , et al. · 2022

Robots will experience non-stationary environment dynamics throughout their lifetime: the robot dynamics can change due to wear and tear, or its surroundings may change over time. Eventually, the robots should perform well in all of the en…

Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors Open

Steven Bohez, Saran Tunyasuvunakool, Philémon Brakel, Fereshteh Sadeghi, Leonard Hasenclever , et al. · 2022

We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a move…

The Challenges of Exploration for Offline Reinforcement Learning Open

Nathan Lambert, Markus Wulfmeier, William Dwight Whitney, Arunkumar Byravan, Michael Bloesch , et al. · 2022

Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked processes of reinforcement learning: collecting informative experience and inferring optimal behaviour. The second step has been widely studied in the o…

Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies Open

Dushyant Rao, Fereshteh Sadeghi, Leonard Hasenclever, Markus Wulfmeier, Martina Zambelli , et al. · 2021

For robots operating in the real world, it is desirable to learn reusable behaviours that can effectively be transferred and adapted to numerous tasks and scenarios. We propose an approach to learn abstract motor skills from data using a h…

Learning Transferable Motor Skills with Hierarchical Latent Mixture\n Policies Open

Dushyant Rao, Fereshteh Sadeghi, Leonard Hasenclever, Markus Wulfmeier, Martina Zambelli , et al. · 2021

For robots operating in the real world, it is desirable to learn reusable\nbehaviours that can effectively be transferred and adapted to numerous tasks\nand scenarios. We propose an approach to learn abstract motor skills from data\nusing …

Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation Open

Todor Davchev, Oleg Sushkov, Jean-Baptiste Regli, Stefan Schaal, Yusuf Aytar , et al. · 2021

Complex sequential tasks in continuous-control settings often require agents to successfully traverse a set of "narrow passages" in their state space. Solving such tasks with a sparse reward in a sample-efficient manner poses a challenge t…

Wish you were here: Hindsight Goal Selection for long-horizon dexterous\n manipulation Open

Todor Davchev, O. P. Sushkov, Jean-Baptiste Regli, Stefan Schaal, Yusuf Aytar , et al. · 2021

Complex sequential tasks in continuous-control settings often require agents\nto successfully traverse a set of "narrow passages" in their state space.\nSolving such tasks with a sparse reward in a sample-efficient manner poses a\nchalleng…

Markus Wulfmeier YOU? Author Swipe