Marco Pavone
YOU?
Author Swipe
View article: Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail Open
End-to-end architectures trained via imitation learning have advanced autonomous driving by scaling model size and data, yet performance remains brittle in safety-critical long-tail scenarios where supervision is sparse and causal understa…
View article: Preventing Robotic Jailbreaking via Multimodal Domain Adaptation
Preventing Robotic Jailbreaking via Multimodal Domain Adaptation Open
Large Language Models (LLMs) and Vision-Language Models (VLMs) are increasingly deployed in robotic environments but remain vulnerable to jailbreaking attacks that bypass safety mechanisms and drive unsafe or physically harmful behaviors i…
View article: The Case for Negative Data: From Crash Reports to Counterfactuals for Reasonable Driving
The Case for Negative Data: From Crash Reports to Counterfactuals for Reasonable Driving Open
Learning-based autonomous driving systems are trained mostly on incident-free data, offering little guidance near safety-performance boundaries. Real crash reports contain precisely the contrastive evidence needed, but they are hard to use…
View article: Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation Open
Gender bias in vision-language foundation models (VLMs) raises concerns about their safe deployment and is typically evaluated using benchmarks with gender annotations on real-world images. However, as these benchmarks often contain spurio…
View article: Benchmarking the operation of quantum heuristics and Ising machines: scoring parameter setting strategies on optimization applications
Benchmarking the operation of quantum heuristics and Ising machines: scoring parameter setting strategies on optimization applications Open
We discuss guidelines for evaluating the performance of parameterized stochastic solvers for optimization problems, with particular attention to systems that employ novel hardware, such as digital quantum processors running variational alg…
View article: Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions
Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions Open
Synthesizing realistic Martian landscape videos is crucial for mission rehearsal and robotic simulation. However, this task poses unique challenges due to the scarcity of high-quality Martian data and the significant domain gap between Mar…
View article: CUPID: Curating Data your Robot Loves with Influence Functions
CUPID: Curating Data your Robot Loves with Influence Functions Open
In robot imitation learning, policy performance is tightly coupled with the quality and composition of the demonstration data. Yet, developing a precise understanding of how individual demonstrations contribute to downstream outcomes - suc…
View article: Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis
Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis Open
For autonomous vehicles, safe navigation in complex environments depends on handling a broad range of diverse and rare driving scenarios. Simulation- and scenario-based testing have emerged as key approaches to development and validation o…
View article: Efficient Multi-Camera Tokenization with Triplanes for End-to-End Driving
Efficient Multi-Camera Tokenization with Triplanes for End-to-End Driving Open
Autoregressive Transformers are increasingly being deployed as end-to-end robot and autonomous vehicle (AV) policy architectures, owing to their scalability and potential to leverage internet-scale pretraining for generalization. According…
View article: Pseudo-Simulation for Autonomous Driving
Pseudo-Simulation for Autonomous Driving Open
Existing evaluation paradigms for Autonomous Vehicles (AVs) face critical limitations. Real-world evaluation is often challenging due to safety concerns and a lack of reproducibility, whereas closed-loop simulation can face insufficient re…
View article: E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models
E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models Open
Spatial intelligence, encompassing 3D reconstruction, perception, and reasoning, is fundamental to applications such as robotics, aerial imaging, and extended reality. A key enabler is the real-time, accurate estimation of core 3D attribut…
View article: RealDrive: Retrieval-Augmented Driving with Diffusion Models
RealDrive: Retrieval-Augmented Driving with Diffusion Models Open
Learning-based planners generate natural human-like driving behaviors by learning to reason about nuanced interactions from data, overcoming the rigid behaviors that arise from rule-based planners. Nonetheless, data-driven approaches often…
View article: Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning
Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning Open
Autonomous robots must reason about the physical consequences of their actions to operate effectively in unstructured, real-world environments. We present Scan, Materialize, Simulate (SMS), a unified framework that combines 3D Gaussian Spl…
View article: Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning
Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning Open
While foundation models offer promise toward improving robot safety in out-of-distribution (OOD) scenarios, how to effectively harness their generalist knowledge for real-time, dynamically feasible response remains a crucial problem. We pr…
View article: Generative AI for Autonomous Driving: Frontiers and Opportunities
Generative AI for Autonomous Driving: Frontiers and Opportunities Open
Generative Artificial Intelligence (GenAI) constitutes a transformative technological wave that reconfigures industries through its unparalleled capabilities for content creation, reasoning, planning, and multimodal understanding. This rev…
View article: Deep Learning Warm Starts for Trajectory Optimization on the International Space Station
Deep Learning Warm Starts for Trajectory Optimization on the International Space Station Open
Trajectory optimization is a cornerstone of modern robot autonomy, enabling systems to compute trajectories and controls in real-time while respecting safety and physical constraints. However, it has seen limited usage in spaceflight appli…
View article: Deformable Cargo Transport in Microgravity with Astrobee
Deformable Cargo Transport in Microgravity with Astrobee Open
We present pyastrobee: a simulation environment and control stack for Astrobee in Python, with an emphasis on cargo manipulation and transport tasks. We also demonstrate preliminary success from a sampling-based MPC controller, using reduc…
View article: Describe Anything: Detailed Localized Image and Video Captioning
Describe Anything: Detailed Localized Image and Video Captioning Open
Generating detailed and accurate descriptions for specific regions in images and videos remains a fundamental challenge for vision-language models. We introduce the Describe Anything Model (DAM), a model designed for detailed localized cap…
View article: Taming High-Dimensional Dynamics: Learning Optimal Projections onto Spectral Submanifolds
Taming High-Dimensional Dynamics: Learning Optimal Projections onto Spectral Submanifolds Open
High-dimensional nonlinear systems pose considerable challenges for modeling and control across many domains, from fluid mechanics to advanced robotics. Such systems are typically approximated with reduced-order models, which often rely on…
View article: Discovering dominant dynamics for nonlinear continuum robot control
Discovering dominant dynamics for nonlinear continuum robot control Open
Continuum robots, which emulate biological organisms’ dexterity and flexibility, hold transformative potential for terrestrial and extraterrestrial applications. While such capabilities present significant modeling and control challenges, …
View article: It’s All in the Mix: Technology choice between driverless and human-driven vehicles in sharing systems
It’s All in the Mix: Technology choice between driverless and human-driven vehicles in sharing systems Open
Operators of vehicle-sharing systems such as carsharing or ride-hailing can benefit from integrating driverless vehicles into their fleet. In this context, we study the impact of optimal fleet size and composition on an operator's profitab…
View article: Online Aggregation of Trajectory Predictors
Online Aggregation of Trajectory Predictors Open
Trajectory prediction, the task of forecasting future agent behavior from past data, is central to safe and efficient autonomous driving. A diverse set of methods (e.g., rule-based or learned with different architectures and datasets) have…
View article: Surprise Potential as a Measure of Interactivity in Driving Scenarios
Surprise Potential as a Measure of Interactivity in Driving Scenarios Open
Validating the safety and performance of an autonomous vehicle (AV) requires benchmarking on real-world driving logs. However, typical driving logs contain mostly uneventful scenarios with minimal interactions between road users. Identifyi…
View article: A Convexity-Dependent Two-Phase Training Algorithm for Deep Neural Networks
A Convexity-Dependent Two-Phase Training Algorithm for Deep Neural Networks Open
The key task of machine learning is to minimize the loss function that measures the model fit to the training data. The numerical methods to do this efficiently depend on the properties of the loss function. The most decisive among these p…
View article: LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences Open
View article: LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning via O1-like Monte Carlo Tree Search
LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning via O1-like Monte Carlo Tree Search Open
View article: DreamDrive: Generative 4D Scene Modeling from Street View Images
DreamDrive: Generative 4D Scene Modeling from Street View Images Open
Synthesizing photo-realistic visual observations from an ego vehicle's driving trajectory is a critical step towards scalable training of self-driving models. Reconstruction-based methods create 3D scenes from driving logs and synthesize g…
View article: STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes
STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Open
We present STORM, a spatio-temporal reconstruction model designed for reconstructing dynamic outdoor scenes from sparse observations. Existing dynamic reconstruction methods often rely on per-scene optimization, dense observations across s…
View article: LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models
LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models Open
Emerging 3D geometric foundation models, such as DUSt3R, offer a promising approach for in-the-wild 3D vision tasks. However, due to the high-dimensional nature of the problem space and scarcity of high-quality 3D data, these pre-trained m…
View article: Extrapolated Urban View Synthesis Benchmark
Extrapolated Urban View Synthesis Benchmark Open
Photorealistic simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs). At their core is Novel View Synthesis (NVS), a crucial capability that generates diverse unseen viewpoints to accommodate …