Holger Caesar
YOU?
Author Swipe
View article: ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving
ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving Open
In this paper, we present details of the 1st W-CODA workshop, held in conjunction with the ECCV 2024. W-CODA aims to explore next-generation solutions for autonomous driving corner cases, empowered by state-of-the-art multimodal perception…
View article: A Vehicle System for Navigating Among Vulnerable Road Users Including Remote Operation
A Vehicle System for Navigating Among Vulnerable Road Users Including Remote Operation Open
We present a vehicle system capable of navigating safely and efficiently around Vulnerable Road Users (VRUs), such as pedestrians and cyclists. The system comprises key modules for environment perception, localization and mapping, motion p…
View article: VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow
VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow Open
Scene flow estimation aims to recover per-point motion from two adjacent LiDAR scans. However, in real-world applications such as autonomous driving, points rarely move independently of others, especially for nearby points belonging to the…
View article: Towards Vision Zero: The Accid3nD Dataset
Towards Vision Zero: The Accid3nD Dataset Open
Even though a significant amount of work has been done to increase the safety of transportation networks, accidents still occur regularly. They must be understood as unavoidable and sporadic outcomes of traffic networks. No public dataset …
View article: LeAP: Consistent multi-domain 3D labeling using Foundation Models
LeAP: Consistent multi-domain 3D labeling using Foundation Models Open
Availability of datasets is a strong driver for research on 3D semantic understanding, and whilst obtaining unlabeled 3D point cloud data is straightforward, manually annotating this data with semantic labels is time-consuming and costly. …
View article: OSSA: Unsupervised One-Shot Style Adaptation
OSSA: Unsupervised One-Shot Style Adaptation Open
Despite their success in various vision tasks, deep neural network architectures often underperform in out-of-distribution scenarios due to the difference between training and target domain style. To address this limitation, we introduce O…
View article: OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models Open
Panoptic Scene Graph Generation (PSG) aims to segment objects and recognize their relations, enabling the structured understanding of an image. Previous methods focus on predicting predefined object and relation categories, hence limiting …
View article: Bosch Street Dataset: A Multi-Modal Dataset with Imaging Radar for Automated Driving
Bosch Street Dataset: A Multi-Modal Dataset with Imaging Radar for Automated Driving Open
This paper introduces the Bosch street dataset (BSD), a novel multi-modal large-scale dataset aimed at promoting highly automated driving (HAD) and advanced driver-assistance systems (ADAS) research. Unlike existing datasets, BSD offers a …
View article: Redefining Automotive Radar Imaging: A Domain-Informed 1D Deep Learning Approach for High-Resolution and Efficient Performance
Redefining Automotive Radar Imaging: A Domain-Informed 1D Deep Learning Approach for High-Resolution and Efficient Performance Open
Millimeter-wave (mmWave) radars are indispensable for perception tasks of autonomous vehicles, thanks to their resilience in challenging weather conditions. Yet, their deployment is often limited by insufficient spatial resolution for prec…
View article: Label-Efficient 3D Object Detection For Road-Side Units
Label-Efficient 3D Object Detection For Road-Side Units Open
Occlusion presents a significant challenge for safety-critical applications such as autonomous driving. Collaborative perception has recently attracted a large research interest thanks to the ability to enhance the perception of autonomous…
View article: UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes
UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes Open
Unsupervised 3D object detection methods have emerged to leverage vast amounts of data without requiring manual labels for training. Recent approaches rely on dynamic objects for learning to detect mobile objects but penalize the detection…
View article: DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection
DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection Open
The perception of autonomous vehicles has to be efficient, robust, and cost-effective. However, cameras are not robust against severe weather conditions, lidar sensors are expensive, and the performance of radar-based perception is still i…
View article: Towards learning-based planning:The nuPlan benchmark for real-world autonomous driving
Towards learning-based planning:The nuPlan benchmark for real-world autonomous driving Open
Machine Learning (ML) has replaced traditional handcrafted methods for perception and prediction in autonomous vehicles. Yet for the equally important planning task, the adoption of ML-based techniques is slow. We present nuPlan, the world…
View article: ICP-Flow: LiDAR Scene Flow Estimation with ICP
ICP-Flow: LiDAR Scene Flow Estimation with ICP Open
Scene flow characterizes the 3D motion between two LiDAR scans captured by an autonomous vehicle at nearby timesteps. Prevalent methods consider scene flow as point-wise unconstrained flow vectors that can be learned by either large-scale …
View article: VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation
VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation Open
Panoptic Scene Graph Generation (PSG) aims at achieving a comprehensive image understanding by simultaneously segmenting objects and predicting relations among objects. However, the long-tail problem among relations leads to unsatisfactory…
View article: Graph Convolutional Networks for Complex Traffic Scenario Classification
Graph Convolutional Networks for Complex Traffic Scenario Classification Open
A scenario-based testing approach can reduce the time required to obtain statistically significant evidence of the safety of Automated Driving Systems (ADS). Identifying these scenarios in an automated manner is a challenging task. Most me…
View article: BaSAL: Size-Balanced Warm Start Active Learning for LiDAR Semantic Segmentation
BaSAL: Size-Balanced Warm Start Active Learning for LiDAR Semantic Segmentation Open
Active learning strives to reduce the need for costly data annotation, by repeatedly querying an annotator to label the most informative samples from a pool of unlabeled data, and then training a model from these samples. We identify two p…
View article: Offline Tracking with Object Permanence
Offline Tracking with Object Permanence Open
To reduce the expensive labor cost for manual labeling autonomous driving datasets, an alternative is to automatically label the datasets using an offline perception system. However, objects might be temporally occluded. Such occlusion sce…
View article: HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation
HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation Open
Panoptic Scene Graph generation (PSG) is a recently proposed task in image scene understanding that aims to segment the image and extract triplets of subjects, objects and their relations to build a scene graph. This task is particularly c…
View article: UniBEV: Multi-modal 3D Object Detection with Uniform BEV Encoders for Robustness against Missing Sensor Modalities
UniBEV: Multi-modal 3D Object Detection with Uniform BEV Encoders for Robustness against Missing Sensor Modalities Open
Multi-sensor object detection is an active research topic in automated driving, but the robustness of such detection models against missing sensor input (modality missing), e.g., due to a sudden sensor failure, is a critical problem which …
View article: Lanelet2 for nuScenes: Map Dataset
Lanelet2 for nuScenes: Map Dataset Open
Lanelet2 maps for the nuScenes dataset, which enable the usage of diverse map-based anchor paths and spatial semantic information. For details see our paper and project page. We also provide a pip package to facilitate the usage. The maps …
View article: Lanelet2 for nuScenes: Map Dataset
Lanelet2 for nuScenes: Map Dataset Open
Lanelet2 maps for the nuScenes dataset, which enable the usage of diverse map-based anchor paths and spatial semantic information. For details see our paper and project page. We also provide a pip package to facilitate the usage. The maps …
View article: SliceMatch: Geometry-Guided Aggregation for Cross-View Pose Estimation
SliceMatch: Geometry-Guided Aggregation for Cross-View Pose Estimation Open
This work addresses cross-view camera pose estimation, i.e., determining the 3-Degrees-of-Freedom camera pose of a given ground-level image w.r.t. an aerial image of the local area. We propose SliceMatch, which consists of ground and aeria…
View article: Lanelet2 for nuScenes: Enabling Spatial Semantic Relationships and Diverse Map-based Anchor Paths
Lanelet2 for nuScenes: Enabling Spatial Semantic Relationships and Diverse Map-based Anchor Paths Open
Motion prediction and planning are key components to enable autonomous driving. Although high definition (HD) maps provide important contextual information that constrains the action space of traffic participants, most approaches are not a…
View article: HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation
HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation Open
Panoptic Scene Graph generation (PSG) is a recently proposed task in image scene understanding that aims to segment the image and extract triplets of subjects, objects and their relations to build a scene graph. This task is particularly c…
View article: SliceMatch: Geometry-guided Aggregation for Cross-View Pose Estimation
SliceMatch: Geometry-guided Aggregation for Cross-View Pose Estimation Open
This work addresses cross-view camera pose estimation, i.e., determining the 3-Degrees-of-Freedom camera pose of a given ground-level image w.r.t. an aerial image of the local area. We propose SliceMatch, which consists of ground and aeria…
View article: Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking
Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking Open
Panoptic scene understanding and tracking of dynamic agents are essential for robots and automated vehicles to navigate in urban environments. As LiDARs provide accurate illumination-independent geometric depictions of the scene, performin…
View article: NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles
NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles Open
In this work, we propose the world's first closed-loop ML-based planning benchmark for autonomous driving. While there is a growing body of ML-based motion planners, the lack of established datasets and metrics has limited the progress in …