Davide Scaramuzza
YOU?
Author Swipe
View article: The Reality Gap in Robotics: Challenges, Solutions, and Best Practices
The Reality Gap in Robotics: Challenges, Solutions, and Best Practices Open
Machine learning has facilitated significant advancements across various robotics domains, including navigation, locomotion, and manipulation. Many such achievements have been driven by the extensive use of simulation as a critical tool fo…
View article: What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework
What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework Open
Object-Goal Navigation (ObjectNav) is a critical component toward deploying mobile robots in everyday, uncontrolled environments such as homes, schools, and workplaces. In this context, a robot must locate target objects in previously unse…
View article: Sight Over Site: Perception-Aware Reinforcement Learning for Efficient Robotic Inspection
Sight Over Site: Perception-Aware Reinforcement Learning for Efficient Robotic Inspection Open
Autonomous inspection is a central problem in robotics, with applications ranging from industrial monitoring to search-and-rescue. Traditionally, inspection has often been reduced to navigation tasks, where the objective is to reach a pred…
View article: Event Spectroscopy: Event-based Multispectral and Depth Sensing using Structured Light
Event Spectroscopy: Event-based Multispectral and Depth Sensing using Structured Light Open
Uncrewed aerial vehicles (UAVs) are increasingly deployed in forest environments for tasks such as environmental monitoring and search and rescue, which require safe navigation through dense foliage and precise data collection. Traditional…
View article: Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation
Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation Open
Learning control policies in simulation enables rapid, safe, and cost-effective development of advanced robotic capabilities. However, transferring these policies to the real world remains difficult due to the sim-to-real gap, where unmode…
View article: LiDAR Registration with Visual Foundation Models
LiDAR Registration with Visual Foundation Models Open
View article: A roadmap for AI in robotics
A roadmap for AI in robotics Open
View article: Sight Guide: A Wearable Assistive Perception and Navigation System for the Vision Assistance Race in the Cybathlon 2024
Sight Guide: A Wearable Assistive Perception and Navigation System for the Vision Assistance Race in the Cybathlon 2024 Open
Visually impaired individuals face significant challenges navigating and interacting with unknown situations, particularly in tasks requiring spatial awareness and semantic scene understanding. To accelerate the development and evaluate th…
View article: Perturbed State Space Feature Encoders for Optical Flow with Event Cameras
Perturbed State Space Feature Encoders for Optical Flow with Event Cameras Open
With their motion-responsive nature, event-based cameras offer significant advantages over traditional cameras for optical flow estimation. While deep learning has improved upon traditional methods, current neural networks adopted for even…
View article: LiDAR Registration with Visual Foundation Models
LiDAR Registration with Visual Foundation Models Open
LiDAR registration is a fundamental task in robotic mapping and localization. A critical component of aligning two point clouds is identifying robust point correspondences using point descriptors. This step becomes particularly challenging…
View article: A Monocular Event-Camera Motion Capture System
A Monocular Event-Camera Motion Capture System Open
Motion capture systems are a widespread tool in research to record ground-truth poses of objects. Commercial systems use reflective markers attached to the object and then triangulate pose of the object from multiple camera views. Conseque…
View article: Unlocking Efficient Vehicle Dynamics Modeling via Analytic World Models
Unlocking Efficient Vehicle Dynamics Modeling via Analytic World Models Open
Differentiable simulators represent an environment's dynamics as a differentiable function. Within robotics and autonomous driving, this property is used in Analytic Policy Gradients (APG), which relies on backpropagating through the dynam…
View article: Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight
Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight Open
Autonomous drone racing has risen as a challenging robotic benchmark for testing the limits of learning, perception, planning, and control. Expert human pilots are able to agilely fly a drone through a race track by mapping the real-time f…
View article: Multi-Aerial Robotic System for Power Line Inspection and Maintenance: Comparative Analysis From the AERIAL-CORE Final Experiments
Multi-Aerial Robotic System for Power Line Inspection and Maintenance: Comparative Analysis From the AERIAL-CORE Final Experiments Open
View article: GG-SSMs: Graph-Generating State Space Models
GG-SSMs: Graph-Generating State Space Models Open
State Space Models (SSMs) are powerful tools for modeling sequential data in computer vision and time series analysis domains. However, traditional SSMs are limited by fixed, one-dimensional sequential processing, which restricts their abi…
View article: Multi-Task Reinforcement Learning for Quadrotors
Multi-Task Reinforcement Learning for Quadrotors Open
Reinforcement learning (RL) has shown great effectiveness in quadrotor control, enabling specialized policies to develop even human-champion-level performance in single-task scenarios. However, these specialized policies often struggle wit…
View article: GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control Open
We present GEM, a Generalizable Ego-vision Multimodal world model that predicts future frames using a reference frame, sparse features, human poses, and ego-trajectories. Hence, our model has precise control over object dynamics, ego-agent…
View article: Student-Informed Teacher Training
Student-Informed Teacher Training Open
Imitation learning with a privileged teacher has proven effective for learning complex control behaviors from high-dimensional inputs, such as images. In this framework, a teacher is trained with privileged task information, while a studen…
View article: Drift-free Visual SLAM using Digital Twins
Drift-free Visual SLAM using Digital Twins Open
Globally-consistent localization in urban environments is crucial for autonomous systems such as self-driving vehicles and drones, as well as assistive technologies for visually impaired people. Traditional Visual-Inertial Odometry (VIO) a…
View article: Monocular Event-Based Vision for Obstacle Avoidance with a Quadrotor
Monocular Event-Based Vision for Obstacle Avoidance with a Quadrotor Open
We present the first static-obstacle avoidance method for quadrotors using just an onboard, monocular event camera. Quadrotors are capable of fast and agile flight in cluttered environments when piloted manually, but vision-based autonomou…
View article: Environment as Policy: Learning to Race in Unseen Tracks
Environment as Policy: Learning to Race in Unseen Tracks Open
Reinforcement learning (RL) has achieved outstanding success in complex robot control tasks, such as drone racing, where the RL agents have outperformed human champions in a known racing track. However, these agents fail in unseen track co…
View article: Learning Quadrotor Control From Visual Features Using Differentiable Simulation
Learning Quadrotor Control From Visual Features Using Differentiable Simulation Open
The sample inefficiency of reinforcement learning (RL) remains a significant challenge in robotics. RL requires large-scale simulation and can still cause long training times, slowing research and innovation. This issue is particularly pro…
View article: S7: Selective and Simplified State Space Layers for Sequence Modeling
S7: Selective and Simplified State Space Layers for Sequence Modeling Open
A central challenge in sequence modeling is efficiently handling tasks with extended contexts. While recent state-space models (SSMs) have made significant progress in this area, they often lack input-dependent filtering or require substan…
View article: Residual Policy Learning for Perceptive Quadruped Control Using Differentiable Simulation
Residual Policy Learning for Perceptive Quadruped Control Using Differentiable Simulation Open
First-order Policy Gradient (FoPG) algorithms such as Backpropagation through Time and Analytical Policy Gradients leverage local simulation physics to accelerate policy search, significantly improving sample efficiency in robot control co…
View article: FaVoR: Features via Voxel Rendering for Camera Relocalization
FaVoR: Features via Voxel Rendering for Camera Relocalization Open
Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image. Among these, sparse feature matching stands out as an efficient, versatile, and generally lightweight approach with numerou…
View article: Structure-Invariant Range-Visual-Inertial Odometry
Structure-Invariant Range-Visual-Inertial Odometry Open
The Mars Science Helicopter (MSH) mission aims to deploy the next generation of unmanned helicopters on Mars, targeting landing sites in highly irregular terrain such as Valles Marineris, the largest canyons in the Solar system with elevat…
View article: Reinforcement Learning Meets Visual Odometry
Reinforcement Learning Meets Visual Odometry Open
Visual Odometry (VO) is essential to downstream mobile robotics and augmented/virtual reality tasks. Despite recent advances, existing VO methods still rely on heuristic design choices that require several weeks of hyperparameter tuning by…
View article: Demonstrating Agile Flight from Pixels without State Estimation
Demonstrating Agile Flight from Pixels without State Estimation Open
Quadrotors are among the most agile flying robots. Despite recent advances in learning-based control and computer vision, autonomous drones still rely on explicit state estimation. On the other hand, human pilots only rely on a first-perso…
View article: Low-latency automotive vision with event cameras
Low-latency automotive vision with event cameras Open
The computer vision algorithms used currently in advanced driver assistance systems rely on image-based RGB cameras, leading to a critical bandwidth–latency trade-off for delivering safe driving experiences. To address this, event cameras …
View article: Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory Open
Despite their successes, deep learning models struggle with tasks requiring complex reasoning and function composition. We present a theoretical and empirical investigation into the limitations of Structured State Space Models (SSMs) and T…