Markus Wulfmeier
YOU?
Author Swipe
View article: Improving cosmological reach of a gravitational wave observatory using Deep Loop Shaping
Improving cosmological reach of a gravitational wave observatory using Deep Loop Shaping Open
Improved low-frequency sensitivity of gravitational wave observatories would unlock study of intermediate-mass black hole mergers and binary black hole eccentricity and provide early warnings for multimessenger observations of binary neutr…
View article: Exploiting Policy Idling for Dexterous Manipulation
Exploiting Policy Idling for Dexterous Manipulation Open
Learning-based methods for dexterous manipulation have made notable progress in recent years. However, learned policies often still lack reliability and exhibit limited robustness to important factors of variation. One failure pattern that…
View article: Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement
Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement Open
Imitation Learning from Observation (IfO) offers a powerful way to learn behaviors at large-scale: Unlike behavior cloning or offline reinforcement learning, IfO can leverage action-free demonstrations and thus circumvents the need for cos…
View article: Using cognitive models to reveal value trade-offs in language models
Using cognitive models to reveal value trade-offs in language models Open
Value trade-offs are an integral part of human decision-making and language use, however, current tools for interpreting such dynamic and multi-faceted notions of values in LLMs are limited. In cognitive science, so-called "cognitive model…
View article: Aligning Large Language Models with Human Feedback: Mathematical Foundations and Algorithm Design
Aligning Large Language Models with Human Feedback: Mathematical Foundations and Algorithm Design Open
View article: LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Open
The success of Large Language Models (LLMs) has sparked interest in various agentic applications. A key hypothesis is that LLMs, leveraging common sense and Chain-of-Thought (CoT) reasoning, can effectively explore and efficiently solve co…
View article: Imitating Language via Scalable Inverse Reinforcement Learning
Imitating Language via Scalable Inverse Reinforcement Learning Open
The majority of language model training builds on imitation learning. It covers pretraining, supervised fine-tuning, and affects the starting conditions for reinforcement learning from human feedback (RLHF). The simplicity and scalability …
View article: Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning
Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning Open
We apply multi-agent deep reinforcement learning (RL) to train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision. This setting reflects many challenges of real-world robotics, including a…
View article: Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution
Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution Open
Recent reinforcement learning approaches have shown surprisingly strong capabilities of bang-bang policies for solving continuous control benchmarks. The underlying coarse action space discretizations often yield favourable exploration cha…
View article: Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning
Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning Open
Recent advances in real-world applications of reinforcement learning (RL) have relied on the ability to accurately simulate systems at scale. However, domains such as fluid dynamical systems exhibit complex dynamic phenomena that are hard …
View article: Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots
Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots Open
Reinforcement learning solely from an agent's self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly …
View article: Foundations for Transfer in Reinforcement Learning: A Taxonomy of Knowledge Modalities
Foundations for Transfer in Reinforcement Learning: A Taxonomy of Knowledge Modalities Open
Contemporary artificial intelligence systems exhibit rapidly growing abilities accompanied by the growth of required resources, expansive datasets and corresponding investments into computing infrastructure. Although earlier successes pred…
View article: Replay across Experiments: A Natural Extension of Off-Policy RL
Replay across Experiments: A Natural Extension of Off-Policy RL Open
Replaying data is a principal mechanism underlying the stability and data efficiency of off-policy reinforcement learning (RL). We present an effective yet simple framework to extend the use of replays across multiple experiments, minimall…
View article: Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning
Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning Open
We present a novel approach to address the challenge of generalization in offline reinforcement learning (RL), where the agent learns from a fixed dataset without any additional interaction with the environment. Specifically, we aim to imp…
View article: Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World
Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World Open
Experimentation on real robots is demanding in terms of time and costs. For this reason, a large part of the reinforcement learning (RL) community uses simulators to develop and benchmark algorithms. However, insights gained in simulation …
View article: Towards A Unified Agent with Foundation Models
Towards A Unified Agent with Foundation Models Open
Language Models and Vision Language Models have recently demonstrated unprecedented capabilities in terms of understanding human intentions, reasoning, scene understanding, and planning-like behaviour, in text form, among many others. In t…
View article: Massively Scalable Inverse Reinforcement Learning in Google Maps
Massively Scalable Inverse Reinforcement Learning in Google Maps Open
Inverse reinforcement learning (IRL) offers a powerful and general framework for learning humans' latent preferences in route recommendation, yet no approach has successfully addressed planetary-scale problems with hundreds of millions of …
View article: Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning Open
We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environme…
View article: SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration
SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration Open
The ability to effectively reuse prior knowledge is a key requirement when building general and flexible Reinforcement Learning (RL) agents. Skill reuse is one of the most common approaches, but current methods have considerable limitation…
View article: Solving Continuous Control via Q-learning
Solving Continuous Control via Q-learning Open
While there has been substantial success for solving continuous control with actor-critic methods, simpler critic-only methods such as Q-learning find limited application in the associated high-dimensional action spaces. However, most acto…
View article: MO2: Model-Based Offline Options
MO2: Model-Based Offline Options Open
The ability to discover useful behaviours from past experience and transfer them to new tasks is considered a core component of natural embodied intelligence. Inspired by neuroscience, discovering behaviours that switch at bottleneck state…
View article: Figure Data for the paper "From Motor Control to Team Play in Simulated Humanoid Football"
Figure Data for the paper "From Motor Control to Team Play in Simulated Humanoid Football" Open
Data Release for Article: From Motor Control to Team Play in Simulated Humanoid Football This package releases a set of Python notebooks each reproducing a quantitative figure featured in the research article "Fro…
View article: Figure Data for the paper "From Motor Control to Team Play in Simulated Humanoid Football"
Figure Data for the paper "From Motor Control to Team Play in Simulated Humanoid Football" Open
Data Release for Article: From Motor Control to Team Play in Simulated Humanoid Football This package releases a set of Python notebooks each reproducing a quantitative figure featured in the research article "Fro…
View article: Forgetting and Imbalance in Robot Lifelong Learning with Off-policy Data
Forgetting and Imbalance in Robot Lifelong Learning with Off-policy Data Open
Robots will experience non-stationary environment dynamics throughout their lifetime: the robot dynamics can change due to wear and tear, or its surroundings may change over time. Eventually, the robots should perform well in all of the en…
View article: Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors
Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors Open
We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a move…
View article: The Challenges of Exploration for Offline Reinforcement Learning
The Challenges of Exploration for Offline Reinforcement Learning Open
Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked processes of reinforcement learning: collecting informative experience and inferring optimal behaviour. The second step has been widely studied in the o…
View article: Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies
Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies Open
For robots operating in the real world, it is desirable to learn reusable behaviours that can effectively be transferred and adapted to numerous tasks and scenarios. We propose an approach to learn abstract motor skills from data using a h…
View article: Learning Transferable Motor Skills with Hierarchical Latent Mixture\n Policies
Learning Transferable Motor Skills with Hierarchical Latent Mixture\n Policies Open
For robots operating in the real world, it is desirable to learn reusable\nbehaviours that can effectively be transferred and adapted to numerous tasks\nand scenarios. We propose an approach to learn abstract motor skills from data\nusing …
View article: Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation
Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation Open
Complex sequential tasks in continuous-control settings often require agents to successfully traverse a set of "narrow passages" in their state space. Solving such tasks with a sparse reward in a sample-efficient manner poses a challenge t…
View article: Wish you were here: Hindsight Goal Selection for long-horizon dexterous\n manipulation
Wish you were here: Hindsight Goal Selection for long-horizon dexterous\n manipulation Open
Complex sequential tasks in continuous-control settings often require agents\nto successfully traverse a set of "narrow passages" in their state space.\nSolving such tasks with a sparse reward in a sample-efficient manner poses a\nchalleng…