Ashwin Balakrishna
YOU?
Author Swipe
View article: Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer
Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer Open
General-purpose robots need a deep understanding of the physical world, advanced reasoning, and general and dexterous control. This report introduces the latest generation of the Gemini Robotics model family: Gemini Robotics 1.5, a multi-e…
View article: Robo-DM: Data Management For Large Robot Datasets
Robo-DM: Data Management For Large Robot Datasets Open
Recent results suggest that very large datasets of teleoperated robot demonstrations can be used to train transformer-based models that have the potential to generalize to new scenes, robots, and tasks. However, curating, distributing, and…
View article: Gemini Robotics: Bringing AI into the Physical World
Gemini Robotics: Bringing AI into the Physical World Open
Recent advancements in large multimodal models have led to the emergence of remarkable generalist capabilities in digital domains, yet their translation to physical agents such as robots remains a significant challenge. This report introdu…
View article: A Taxonomy for Evaluating Generalist Robot Policies
A Taxonomy for Evaluating Generalist Robot Policies Open
Machine learning for robotics promises to unlock generalization to novel tasks and environments. Guided by this promise, many recent works have focused on scaling up robot data collection and developing larger, more expressive policies to …
View article: Robot Data Curation with Mutual Information Estimators
Robot Data Curation with Mutual Information Estimators Open
The performance of imitation learning policies often hinges on the datasets with which they are trained. Consequently, investment in data collection for robotics has grown across both industrial and academic labs. However, despite the mark…
View article: GHIL-Glue: Hierarchical Control with Filtered Subgoal Images
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images Open
Image and video generative models that are pre-trained on Internet-scale data can greatly increase the generalization capacity of robot learning systems. These models can function as high-level planners, generating intermediate subgoals fo…
View article: MANIP: A Modular Architecture for Integrating Interactive Perception for Robot Manipulation
MANIP: A Modular Architecture for Integrating Interactive Perception for Robot Manipulation Open
We propose a modular systems architecture, MANIP, that can facilitate the design and development of robot manipulation systems by systematically combining learned subpolicies with well-established procedural algorithmic primitives such as …
View article: Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot
Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot Open
Building semantic 3D maps is valuable for searching for objects of interest in offices, warehouses, stores, and homes. We present a mapping system that incrementally builds a Language-Embedded Gaussian Splat (LEGS): a detailed 3D scene rep…
View article: OpenVLA: An Open-Source Vision-Language-Action Model
OpenVLA: An Open-Source Vision-Language-Action Model Open
Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tu…
View article: DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset Open
The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting ro…
View article: Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models Open
Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning; adoption that has fueled a wealth of new models such as LLaVa, InstructBLIP, an…
View article: Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations
Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations Open
Providing densely shaped reward functions for RL algorithms is often exceedingly challenging, motivating the development of RL algorithms that can learn from easier-to-specify sparse reward functions. This sparsity poses new exploration ch…
View article: Learning Switching Criteria for Sim2Real Transfer of Robotic Fabric Manipulation Policies
Learning Switching Criteria for Sim2Real Transfer of Robotic Fabric Manipulation Policies Open
Simulation-to-reality transfer has emerged as a popular and highly successful method to train robotic control policies for a wide variety of tasks. However, it is often challenging to determine when policies trained in simulation are ready…
View article: Dynamics-Aware Comparison of Learned Reward Functions
Dynamics-Aware Comparison of Learned Reward Functions Open
The ability to learn reward functions plays an important role in enabling the deployment of intelligent agents in the real world. However, comparing reward functions, for example as a means of evaluating reward learning methods, presents a…
View article: MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance
MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance Open
Safe exploration is critical for using reinforcement learning (RL) in risk-sensitive environments. Recent work learns risk measures which measure the probability of violating constraints, which can then be used to enable safety. However, l…
View article: LEGS: Learning Efficient Grasp Sets for Exploratory Grasping
LEGS: Learning Efficient Grasp Sets for Exploratory Grasping Open
While deep learning has enabled significant progress in designing general purpose robot grasping systems, there remain objects which still pose challenges for these systems. Recent work on Exploratory Grasping has formalized the problem of…
View article: ThriftyDAgger: Budget-Aware Novelty and Risk Gating for Interactive Imitation Learning
ThriftyDAgger: Budget-Aware Novelty and Risk Gating for Interactive Imitation Learning Open
Effective robot learning often requires online human feedback and interventions that can cost significant human time, giving rise to the central challenge in interactive imitation learning: is it possible to control the timing and length o…
View article: Kit-Net: Self-Supervised Learning to Kit Novel 3D Objects into Novel 3D Cavities
Kit-Net: Self-Supervised Learning to Kit Novel 3D Objects into Novel 3D Cavities Open
In industrial part kitting, 3D objects are inserted into cavities for transportation or subsequent assembly. Kitting is a critical step as it can decrease downstream processing and handling times and enable lower storage and shipping costs…
View article: LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Iterative Tasks.
LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Iterative Tasks. Open
Reinforcement learning (RL) algorithms have shown impressive success in exploring high-dimensional environments to learn complex, long-horizon tasks, but can often exhibit unsafe behaviors and require extensive environment interaction when…
View article: LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Sparse Reward Iterative Tasks
LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Sparse Reward Iterative Tasks Open
Reinforcement learning (RL) has shown impressive success in exploring high-dimensional environments to learn complex tasks, but can often exhibit unsafe behaviors and require extensive environment interaction when exploration is unconstrai…
View article: Untangling Dense Non-Planar Knots by Learning Manipulation Features and Recovery Policies
Untangling Dense Non-Planar Knots by Learning Manipulation Features and Recovery Policies Open
Robot manipulation for untangling 1D deformable structures such as ropes, cables, and wires is challenging due to their infinite dimensional configuration space, complex dynamics, and tendency to self-occlude. Analytical controllers often …
View article: Untangling Dense Non-Planar Knots by Learning Manipulation Features and Recovery Policies
Untangling Dense Non-Planar Knots by Learning Manipulation Features and Recovery Policies Open
Robot manipulation for untangling 1D deformable structures such as ropes, cables, and wires is challenging due to their infinite dimensional configuration space, complex dynamics, and tendency to self-occlude.Analytical controllers often f…
View article: Policy Gradient Bayesian Robust Optimization for Imitation Learning
Policy Gradient Bayesian Robust Optimization for Imitation Learning Open
The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the huma…
View article: Disentangling Dense Multi-Cable Knots
Disentangling Dense Multi-Cable Knots Open
Disentangling two or more cables requires many steps to remove crossings between and within cables. We formalize the problem of disentangling multiple cables and present an algorithm, Iterative Reduction Of Non-planar Multiple cAble kNots …
View article: Orienting Novel 3D Objects Using Self-Supervised Learning of Rotation Transforms
Orienting Novel 3D Objects Using Self-Supervised Learning of Rotation Transforms Open
Orienting objects is a critical component in the automation of many packing and assembly tasks. We present an algorithm to orient novel objects given a depth image of the object in its current and desired orientation. We formulate a self-s…
View article: LazyDAgger: Reducing Context Switching in Interactive Imitation Learning
LazyDAgger: Reducing Context Switching in Interactive Imitation Learning Open
Corrective interventions while a robot is learning to automate a task provide an intuitive method for a human supervisor to assist the robot and convey information about desired behavior. However, these interventions can impose significant…
View article: Usage of Particle Swarm Optimization to Improve the Performance of Supervised Classifiers
Usage of Particle Swarm Optimization to Improve the Performance of Supervised Classifiers Open
Representing the data appropriately will have a significant effect on the outcome produced by the classifier. Transforming the feature will help to represent the data points in a more suitable way for the classifier. Particle swarm optimiz…