Explanipedia

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail Open

Nvidia Nvidia, NULL AUTHOR_ID, Yan Wang, Wenjie Luo, Junjie Bai , et al. · 2025

End-to-end architectures trained via imitation learning have advanced autonomous driving by scaling model size and data, yet performance remains brittle in safety-critical long-tail scenarios where supervision is sparse and causal understa…

Preventing Robotic Jailbreaking via Multimodal Domain Adaptation Open

Francesco Marchiori, Rohan Sinha, Christopher Agia, Alexander Robey, George J. Pappas , et al. · 2025

Large Language Models (LLMs) and Vision-Language Models (VLMs) are increasingly deployed in robotic environments but remain vulnerable to jailbreaking attacks that bypass safety mechanisms and drive unsafe or physically harmful behaviors i…

The Case for Negative Data: From Crash Reports to Counterfactuals for Reasonable Driving Open

Jay Patrikar, Apoorva Sharma, Sushant Veer, Boyi Li, Sebastian Scherer , et al. · 2025

Learning-based autonomous driving systems are trained mostly on incident-free data, offering little guidance near safety-performance boundaries. Real crash reports contain precisely the contrastive evidence needed, but they are hard to use…

Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation Open

Yusuke Hirota, Ryo Hachiuma, Boyi Li, Ximing Lu, Michael Ross Boone , et al. · 2025

Gender bias in vision-language foundation models (VLMs) raises concerns about their safe deployment and is typically evaluated using benchmarks with gender annotations on real-world images. However, as these benchmarks often contain spurio…

Benchmarking the operation of quantum heuristics and Ising machines: scoring parameter setting strategies on optimization applications Open

David E. Bernal Neira, Robin Brown, Pratik Sathe, Filip Wudarski, Marco Pavone , et al. · 2025

We discuss guidelines for evaluating the performance of parameterized stochastic solvers for optimization problems, with particular attention to systems that employ novel hardware, such as digital quantum processors running variational alg…

Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions Open

Longfei Li, Zhiwen Fan, Wenyan Cong, Xinhang Liu, Yuyang Yin , et al. · 2025

Synthesizing realistic Martian landscape videos is crucial for mission rehearsal and robotic simulation. However, this task poses unique challenges due to the scarcity of high-quality Martian data and the significant domain gap between Mar…

CUPID: Curating Data your Robot Loves with Influence Functions Open

Christopher Agia, Rohan Sinha, Jingyun Yang, Rika Antonova, Marco Pavone , et al. · 2025

In robot imitation learning, policy performance is tightly coupled with the quality and composition of the demonstration data. Yet, developing a precise understanding of how individual demonstrations contribute to downstream outcomes - suc…

Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis Open

Yuan Gao, M. Piccinini, Yuchen Zhang, Dingrui Wang, Korbinian Moller , et al. · 2025

For autonomous vehicles, safe navigation in complex environments depends on handling a broad range of diverse and rare driving scenarios. Simulation- and scenario-based testing have emerged as key approaches to development and validation o…

Efficient Multi-Camera Tokenization with Triplanes for End-to-End Driving Open

Boris Ivanovic, Cristiano Saltori, Yurong You, Yan Wang, Wenjie Luo , et al. · 2025

Autoregressive Transformers are increasingly being deployed as end-to-end robot and autonomous vehicle (AV) policy architectures, owing to their scalability and potential to leverage internet-scale pretraining for generalization. According…

Pseudo-Simulation for Autonomous Driving Open

Wei Cao, Marcel Hallgarten, Tianyu Li, Daniel Dauner, Xunjiang Gu , et al. · 2025

Existing evaluation paradigms for Autonomous Vehicles (AVs) face critical limitations. Real-world evaluation is often challenging due to safety concerns and a lack of reproducibility, whereas closed-loop simulation can face insufficient re…

E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models Open

Wenyan Cong, Yiqing Liang, Yancheng Zhang, Ziyi Yang, Yan Wang , et al. · 2025

Spatial intelligence, encompassing 3D reconstruction, perception, and reasoning, is fundamental to applications such as robotics, aerial imaging, and extended reality. A key enabler is the real-time, accurate estimation of core 3D attribut…

RealDrive: Retrieval-Augmented Driving with Diffusion Models Open

Wenhao Ding, Sushant Veer, Yuxiao Chen, Yulong Cao, Chaowei Xiao , et al. · 2025

Learning-based planners generate natural human-like driving behaviors by learning to reason about nuanced interactions from data, overcoming the rigid behaviors that arise from rule-based planners. Nonetheless, data-driven approaches often…

Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning Open

Amine Elhafsi, Daniel Morton, Marco Pavone · 2025

Autonomous robots must reason about the physical consequences of their actions to operate effectively in unstructured, real-world environments. We present Scan, Materialize, Simulate (SMS), a unified framework that combines 3D Gaussian Spl…

Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning Open

M. R. Ganai, Rohan Sinha, Christopher Agia, Daniel Morton, Luigi Di Lillo , et al. · 2025

While foundation models offer promise toward improving robot safety in out-of-distribution (OOD) scenarios, how to effectively harness their generalist knowledge for real-time, dynamically feasible response remains a crucial problem. We pr…

Generative AI for Autonomous Driving: Frontiers and Opportunities Open

Yuping Wang, Shuo Xing, Can Cui, Renjie Li, Hong Hua , et al. · 2025

Generative Artificial Intelligence (GenAI) constitutes a transformative technological wave that reconfigures industries through its unparalleled capabilities for content creation, reasoning, planning, and multimodal understanding. This rev…

Deep Learning Warm Starts for Trajectory Optimization on the International Space Station Open

Somrita Banerjee, Abhishek Cauligi, Marco Pavone · 2025

Trajectory optimization is a cornerstone of modern robot autonomy, enabling systems to compute trajectories and controls in real-time while respecting safety and physical constraints. However, it has seen limited usage in spaceflight appli…

Deformable Cargo Transport in Microgravity with Astrobee Open

Daniel Morton, Rika Antonova, Brian Coltin, Marco Pavone, Jeannette Bohg · 2025

We present pyastrobee: a simulation environment and control stack for Astrobee in Python, with an emphasis on cargo manipulation and transport tasks. We also demonstrate preliminary success from a sampling-based MPC controller, using reduc…

Describe Anything: Detailed Localized Image and Video Captioning Open

Long Lian, Yifan Ding, Yunhao Ge, Sifei Liu, Hanzi Mao , et al. · 2025

Generating detailed and accurate descriptions for specific regions in images and videos remains a fundamental challenge for vision-language models. We introduce the Describe Anything Model (DAM), a model designed for detailed localized cap…

Taming High-Dimensional Dynamics: Learning Optimal Projections onto Spectral Submanifolds Open

Hugo Buurmeijer, Luis A. Pabon, John Irvin Alora, Roshan S. Kaundinya, George Haller , et al. · 2025

High-dimensional nonlinear systems pose considerable challenges for modeling and control across many domains, from fluid mechanics to advanced robotics. Such systems are typically approximated with reduced-order models, which often rely on…

Discovering dominant dynamics for nonlinear continuum robot control Open

John Irvin Alora, Mattia Cenedese, George Haller, Marco Pavone · 2025

Continuum robots, which emulate biological organisms’ dexterity and flexibility, hold transformative potential for terrestrial and extraterrestrial applications. While such capabilities present significant modeling and control challenges, …

It’s All in the Mix: Technology choice between driverless and human-driven vehicles in sharing systems Open

Layla Martin, Stefan Minner, Marco Pavone, Maximilian Schiffer · 2025

Operators of vehicle-sharing systems such as carsharing or ride-hailing can benefit from integrating driverless vehicles into their fleet. In this context, we study the impact of optimal fleet size and composition on an operator's profitab…

Online Aggregation of Trajectory Predictors Open

Alexander Tong, Apoorva Sharma, Sushant Veer, Marco Pavone, Heng Yang · 2025

Trajectory prediction, the task of forecasting future agent behavior from past data, is central to safe and efficient autonomous driving. A diverse set of methods (e.g., rule-based or learned with different architectures and datasets) have…

Surprise Potential as a Measure of Interactivity in Driving Scenarios Open

Wenhao Ding, Sushant Veer, Karen Leung, Yulong Cao, Marco Pavone · 2025

Validating the safety and performance of an autonomous vehicle (AV) requires benchmarking on real-world driving logs. However, typical driving logs contain mostly uneventful scenarios with minimal interactions between road users. Identifyi…

A Convexity-Dependent Two-Phase Training Algorithm for Deep Neural Networks Open

Tomas Hrycej, Bernhard Bermeitinger, Marco Pavone, G. Wiegand, Siegfried Handschuh · 2025

The key task of machine learning is to minimize the loss function that measures the model fit to the training data. The numerical methods to do this efficiently depend on the properties of the loss function. The most decisive among these p…

LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences Open

Yusuke Hirota, Boyi Li, Ryo Hachiuma, Yueh-Hua Wu, Boris Ivanovic , et al. · 2025

LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning via O1-like Monte Carlo Tree Search Open

Di Zhang, Jianbo Wu, Jingdi Lei, Tong Che, Jiatong Li , et al. · 2025

DreamDrive: Generative 4D Scene Modeling from Street View Images Open

Jiageng Mao, Boyi Li, Boris Ivanovic, Yuxiao Chen, Yan Wang , et al. · 2024

Synthesizing photo-realistic visual observations from an ego vehicle's driving trajectory is a critical step towards scalable training of self-driving models. Reconstruction-based methods create 3D scenes from driving logs and synthesize g…

STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Open

Jiawei Yang, Jiahui Huang, Yuxiao Chen, Yan Wang, Boyi Li , et al. · 2024

We present STORM, a spatio-temporal reconstruction model designed for reconstructing dynamic outdoor scenes from sparse observations. Existing dynamic reconstruction methods often rely on per-scene optimization, dense observations across s…

LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models Open

Ziqi Lu, Heng Yang, Danfei Xu, Boyi Li, Boris Ivanovic , et al. · 2024

Emerging 3D geometric foundation models, such as DUSt3R, offer a promising approach for in-the-wild 3D vision tasks. However, due to the high-dimensional nature of the problem space and scarcity of high-quality 3D data, these pre-trained m…

Extrapolated Urban View Synthesis Benchmark Open

Xiangyu Han, Zhen Jia, Boyi Li, Yan Wang, Boris Ivanovic , et al. · 2024

Photorealistic simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs). At their core is Novel View Synthesis (NVS), a crucial capability that generates diverse unseen viewpoints to accommodate …

Marco Pavone YOU? Author Swipe