Karl Pertsch
YOU?
Author Swipe
View article: π₀: A Vision-Language-Action Flow Model for General Robot Control
π₀: A Vision-Language-Action Flow Model for General Robot Control Open
View article: FAST: Efficient Action Tokenization for Vision-Language-Action Models
FAST: Efficient Action Tokenization for Vision-Language-Action Models Open
View article: $π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization Open
In order for robots to be useful, they must perform practically relevant tasks in the real world, outside of the lab. While vision-language-action (VLA) models have demonstrated impressive results for end-to-end robot control, it remains a…
View article: Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models Open
Generalist robots that can perform a range of different tasks in open-world settings must be able to not only reason about the steps needed to accomplish their goals, but also process complex instructions, prompts, and even feedback during…
View article: FAST: Efficient Action Tokenization for Vision-Language-Action Models
FAST: Efficient Action Tokenization for Vision-Language-Action Models Open
Autoregressive sequence models, such as Transformer-based vision-language action (VLA) policies, can be tremendously effective for capturing complex and generalizable robotic behaviors. However, such models require us to choose a tokenizat…
View article: $π_0$: A Vision-Language-Action Flow Model for General Robot Control
$π_0$: A Vision-Language-Action Flow Model for General Robot Control Open
Robot learning holds tremendous promise to unlock the full potential of flexible, general, and dexterous robot systems, as well as to address some of the deepest questions in artificial intelligence. However, bringing robot learning to the…
View article: Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning
Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning Open
Increasingly large imitation learning datasets are being collected with the goal of training foundation models for robotics. However, despite the fact that data selection has been of utmost importance in vision and natural language process…
View article: Affordance-Guided Reinforcement Learning via Visual Prompting
Affordance-Guided Reinforcement Learning via Visual Prompting Open
Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existi…
View article: OpenVLA: An Open-Source Vision-Language-Action Model
OpenVLA: An Open-Source Vision-Language-Action Model Open
Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tu…
View article: Octo: An Open-Source Generalist Robot Policy
Octo: An Open-Source Generalist Robot Policy Open
Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet gen…
View article: Evaluating Real-World Robot Manipulation Policies in Simulation
Evaluating Real-World Robot Manipulation Policies in Simulation Open
The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, which are likely to worsen as policie…
View article: DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset Open
The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting ro…
View article: Yell At Your Robot: Improving On-the-Fly from Language Corrections
Yell At Your Robot: Improving On-the-Fly from Language Corrections Open
Hierarchical policies that combine language and low-level control have been shown to perform impressively long-horizon robotic tasks, by leveraging either zero-shot high-level planners like pretrained language and vision-language models (L…
View article: LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers
LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers Open
We propose a framework that leverages foundation models as teachers, guiding a reinforcement learning agent to acquire semantically meaningful behavior without human feedback. In our framework, the agent receives task instructions grounded…
View article: Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance Open
We propose BOSS, an approach that automatically learns to solve new long-horizon, complex, and meaningful tasks by growing a learned skill library with minimal supervision. Prior work in reinforcement learning require expert supervision, i…
View article: Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment: Robotic Learning Datasets and RT-X Models Open
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with gen…
View article: RoboCLIP: One Demonstration is Enough to Learn Robot Policies
RoboCLIP: One Demonstration is Enough to Learn Robot Policies Open
Reward specification is a notoriously difficult problem in reinforcement learning, requiring extensive expert supervision to design robust reward functions. Imitation learning (IL) methods attempt to circumvent these problems by utilizing …
View article: Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions Open
In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to pr…
View article: RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control Open
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end train…
View article: PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection
PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection Open
Large-scale data is an essential component of machine learning as demonstrated in recent advances in natural language processing and computer vision research.However, collecting large-scale robotic data is much more expensive and slower as…
View article: RT-1: Robotics Transformer for Real-World Control at Scale
RT-1: Robotics Transformer for Real-World Control at Scale Open
By transferring knowledge from large, diverse, taskagnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small taskspecific datasets to a high level of performance.While this capabil…
View article: SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling
SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling Open
Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks. Prior works have defined pre-training tasks via natural language instructions, but doing so requires tedious human annotat…
View article: Cross-Domain Transfer via Semantic Skill Imitation
Cross-Domain Transfer via Semantic Skill Imitation Open
We propose an approach for semantic imitation, which uses demonstrations from a source domain, e.g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e.g. a robotic manipulator in a simulated kitchen. In…
View article: RT-1: Robotics Transformer for Real-World Control at Scale
RT-1: Robotics Transformer for Real-World Control at Scale Open
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capa…
View article: PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection
PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection Open
Large-scale data is an essential component of machine learning as demonstrated in recent advances in natural language processing and computer vision research. However, collecting large-scale robotic data is much more expensive and slower a…
View article: Task-Induced Representation Learning
Task-Induced Representation Learning Open
In this work, we evaluate the effectiveness of representation learning approaches for decision making in visually complex environments. Representation learning is essential for effective reinforcement learning (RL) from high-dimensional in…
View article: Skill-based Meta-Reinforcement Learning
Skill-based Meta-Reinforcement Learning Open
While deep reinforcement learning methods have shown impressive results in robot learning, their sample inefficiency makes the learning of complex, long-horizon behaviors with real robot systems infeasible. To mitigate this issue, meta-rei…
View article: Demonstration-Guided Reinforcement Learning with Learned Skills
Demonstration-Guided Reinforcement Learning with Learned Skills Open
Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors by leveraging both reward feedback and a set of target task demonstrations. Prior approaches for demonstration-guided RL treat every ne…
View article: Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments
Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments Open
Deep reinforcement learning (RL) agents are able to learn contact-rich manipulation tasks by maximizing a reward signal, but require large amounts of experience, especially in environments with many obstacles that complicate exploration. I…
View article: Accelerating Reinforcement Learning with Learned Skill Priors
Accelerating Reinforcement Learning with Learned Skill Priors Open
Intelligent agents rely heavily on prior experience when learning a new task, yet most modern reinforcement learning (RL) approaches learn every task from scratch. One approach for leveraging prior knowledge is to transfer skills learned o…