Explanipedia

π₀: A Vision-Language-Action Flow Model for General Robot Control Open

Kevin Black, Noah Brown, Danny Driess, A. Esmail, Michael Equi , et al. · 2025

FAST: Efficient Action Tokenization for Vision-Language-Action Models Open

Karl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair , et al. · 2025

$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization Open

Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia , et al. · 2025

In order for robots to be useful, they must perform practically relevant tasks in the real world, outside of the lab. While vision-language-action (VLA) models have demonstrated impressive results for end-to-end robot control, it remains a…

Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models Open

Lucy Xiaoyang Shi, Brian Ichter, Michael Equi, Liyiming Ke, Karl Pertsch , et al. · 2025

Generalist robots that can perform a range of different tasks in open-world settings must be able to not only reason about the steps needed to accomplish their goals, but also process complex instructions, prompts, and even feedback during…

FAST: Efficient Action Tokenization for Vision-Language-Action Models Open

Karl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair , et al. · 2025

Autoregressive sequence models, such as Transformer-based vision-language action (VLA) policies, can be tremendously effective for capturing complex and generalizable robotic behaviors. However, such models require us to choose a tokenizat…

$π_0$: A Vision-Language-Action Flow Model for General Robot Control Open

Kevin Black, Noah Brown, Danny Driess, A. Esmail, Michael Equi , et al. · 2024

Robot learning holds tremendous promise to unlock the full potential of flexible, general, and dexterous robot systems, as well as to address some of the deepest questions in artificial intelligence. However, bringing robot learning to the…

Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning Open

Joey Hejna, Chethan Bhateja, Yichen Jian, Karl Pertsch, Dorsa Sadigh · 2024

Increasingly large imitation learning datasets are being collected with the goal of training foundation models for robotics. However, despite the fact that data selection has been of utmost importance in vision and natural language process…

Affordance-Guided Reinforcement Learning via Visual Prompting Open

Olivia Y. Lee, Annie Xie, Kuan Fang, Karl Pertsch, Chelsea Finn · 2024

Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existi…

OpenVLA: An Open-Source Vision-Language-Action Model Open

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna , et al. · 2024

Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tu…

Octo: An Open-Source Generalist Robot Policy Open

Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black , et al. · 2024

Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet gen…

Evaluating Real-World Robot Manipulation Policies in Simulation Open

Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees , et al. · 2024

The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, which are likely to worsen as policie…

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset Open

Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari , et al. · 2024

The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting ro…

Yell At Your Robot: Improving On-the-Fly from Language Corrections Open

Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch , et al. · 2024

Hierarchical policies that combine language and low-level control have been shown to perform impressively long-horizon robotic tasks, by leveraging either zero-shot high-level planners like pretrained language and vision-language models (L…

LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers Open

Taewook Nam, Juyong Lee, Jesse Zhang, Sung Ju Hwang, Joseph J. Lim , et al. · 2023

We propose a framework that leverages foundation models as teachers, guiding a reinforcement learning agent to acquire semantically meaningful behavior without human feedback. In our framework, the agent receives task instructions grounded…

Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance Open

Jesse Zhang, Jiahui Zhang, Karl Pertsch, Ziyi Liu, Xiang Ren , et al. · 2023

We propose BOSS, an approach that automatically learns to solve new long-horizon, complex, and meaningful tasks by growing a learned skill library with minimal supervision. Prior work in reinforcement learning require expert supervision, i…

Open X-Embodiment: Robotic Learning Datasets and RT-X Models Open

Open X-Embodiment Collaboration, Abhishek Padalkar, Acorn Pooley, Ajinkya Jain, Alex Bewley , et al. · 2023

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with gen…

RoboCLIP: One Demonstration is Enough to Learn Robot Policies Open

Sumedh Sontakke, Jesse Zhang, Sébastien M. R. Arnold, Karl Pertsch, Erdem Bıyık , et al. · 2023

Reward specification is a notoriously difficult problem in reinforcement learning, requiring extensive expert supervision to design robust reward functions. Imitation learning (IL) methods attempt to circumvent these problems by utilizing …

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions Open

Yevgen Chebotar, Quan Vuong, Alex Irpan, Karol Hausman, Fei Xia , et al. · 2023

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to pr…

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control Open

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen , et al. · 2023

We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end train…

PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection Open

Shivin Dass, Karl Pertsch, Hejia Zhang, Youngwoon Lee, Joseph J. Lim , et al. · 2023

Large-scale data is an essential component of machine learning as demonstrated in recent advances in natural language processing and computer vision research.However, collecting large-scale robotic data is much more expensive and slower as…

RT-1: Robotics Transformer for Real-World Control at Scale Open

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis , et al. · 2023

By transferring knowledge from large, diverse, taskagnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small taskspecific datasets to a high level of performance.While this capabil…

SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling Open

Jesse Zhang, Karl Pertsch, Jiahui Zhang, Joseph J. Lim · 2023

Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks. Prior works have defined pre-training tasks via natural language instructions, but doing so requires tedious human annotat…

Cross-Domain Transfer via Semantic Skill Imitation Open

Karl Pertsch, Ruta Desai, Vikash Kumar, Franziska Meier, Joseph J. Lim , et al. · 2022

We propose an approach for semantic imitation, which uses demonstrations from a source domain, e.g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e.g. a robotic manipulator in a simulated kitchen. In…

RT-1: Robotics Transformer for Real-World Control at Scale Open

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis , et al. · 2022

By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capa…

PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection Open

Shivin Dass, Karl Pertsch, Hejia Zhang, Youngwoon Lee, Joseph J. Lim , et al. · 2022

Large-scale data is an essential component of machine learning as demonstrated in recent advances in natural language processing and computer vision research. However, collecting large-scale robotic data is much more expensive and slower a…

Task-Induced Representation Learning Open

Jun Yamada, Karl Pertsch, Anisha Gunjal, Joseph J. Lim · 2022

In this work, we evaluate the effectiveness of representation learning approaches for decision making in visually complex environments. Representation learning is essential for effective reinforcement learning (RL) from high-dimensional in…

Skill-based Meta-Reinforcement Learning Open

Taewook Nam, Shao-Hua Sun, Karl Pertsch, Sung Ju Hwang, Joseph J. Lim · 2022

While deep reinforcement learning methods have shown impressive results in robot learning, their sample inefficiency makes the learning of complex, long-horizon behaviors with real robot systems infeasible. To mitigate this issue, meta-rei…

Demonstration-Guided Reinforcement Learning with Learned Skills Open

Karl Pertsch, Youngwoon Lee, Yue Wu, Joseph J. Lim · 2021

Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors by leveraging both reward feedback and a set of target task demonstrations. Prior approaches for demonstration-guided RL treat every ne…

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments Open

Jun Yamada, Youngwoon Lee, Gautam Salhotra, Karl Pertsch, Max Pflueger , et al. · 2020

Deep reinforcement learning (RL) agents are able to learn contact-rich manipulation tasks by maximizing a reward signal, but require large amounts of experience, especially in environments with many obstacles that complicate exploration. I…

Accelerating Reinforcement Learning with Learned Skill Priors Open

Karl Pertsch, Youngwoon Lee, Joseph J. Lim · 2020

Intelligent agents rely heavily on prior experience when learning a new task, yet most modern reinforcement learning (RL) approaches learn every task from scratch. One approach for leveraging prior knowledge is to transfer skills learned o…

Karl Pertsch YOU? Author Swipe