Daniel Cremers
YOU?
Author Swipe
View article: Learned free-energy functionals from pair-correlation matching for dynamical density functional theory
Learned free-energy functionals from pair-correlation matching for dynamical density functional theory Open
Classical density functional theory (cDFT) and dynamical density functional theory (DDFT) are modern statistical mechanical theories for modeling many-body colloidal systems at the one-body density level. The theories hinge on knowing the …
View article: When and Where do Events Switch in Multi-Event Video Generation?
When and Where do Events Switch in Multi-Event Video Generation? Open
Text-to-video (T2V) generation has surged in response to challenging questions, especially when a long video must depict multiple sequential events with temporal coherence and controllable content. Existing methods that extend to multi-eve…
View article: ControlEvents: Controllable Synthesis of Event Camera Datawith Foundational Prior from Image Diffusion Models
ControlEvents: Controllable Synthesis of Event Camera Datawith Foundational Prior from Image Diffusion Models Open
In recent years, event cameras have gained significant attention due to their bio-inspired properties, such as high temporal resolution and high dynamic range. However, obtaining large-scale labeled ground-truth data for event-based vision…
View article: OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata
OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata Open
Accurate visual localization from aerial views is a fundamental problem with applications in mapping, large-area inspection, and search-and-rescue operations. In many scenarios, these systems require high-precision localization while opera…
View article: TurnBack: A Geospatial Route Cognition Benchmark for Large Language Models through Reverse Route
TurnBack: A Geospatial Route Cognition Benchmark for Large Language Models through Reverse Route Open
Humans can interpret geospatial information through natural language, while the geospatial cognition capabilities of Large Language Models (LLMs) remain underexplored. Prior research in this domain has been constrained by non-quantifiable …
View article: LADB: Latent Aligned Diffusion Bridges for Semi-Supervised Domain Translation
LADB: Latent Aligned Diffusion Bridges for Semi-Supervised Domain Translation Open
Diffusion models excel at generating high-quality outputs but face challenges in data-scarce domains, where exhaustive retraining or costly paired data are often required. To address these limitations, we propose Latent Aligned Diffusion B…
View article: ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association
ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association Open
We present ViSTA-SLAM as a real-time monocular visual SLAM system that operates without requiring camera intrinsics, making it broadly applicable across diverse camera setups. At its core, the system employs a lightweight symmetric two-vie…
View article: ECHO: Ego-Centric modeling of Human-Object interactions
ECHO: Ego-Centric modeling of Human-Object interactions Open
Modeling human-object interactions (HOI) from an egocentric perspective is a largely unexplored yet important problem due to the increasing adoption of wearable devices, such as smart glasses and watches. We investigate how much informatio…
View article: Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images
Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images Open
Volumetric scene reconstruction from a single image is crucial for a broad range of applications like autonomous driving and robotics. Recent volumetric reconstruction methods achieve impressive results, but generally require expensive 3D …
View article: GECO: Geometrically Consistent Embedding with Lightspeed Inference
GECO: Geometrically Consistent Embedding with Lightspeed Inference Open
Recent advances in feature learning have shown that self-supervised vision foundation models can capture semantic correspondences but often lack awareness of underlying 3D geometry. GECO addresses this gap by producing geometrically cohere…
View article: CoProU-VO: Combining Projected Uncertainty for End-to-End Unsupervised Monocular Visual Odometry
CoProU-VO: Combining Projected Uncertainty for End-to-End Unsupervised Monocular Visual Odometry Open
Visual Odometry (VO) is fundamental to autonomous navigation, robotics, and augmented reality, with unsupervised approaches eliminating the need for expensive ground-truth labels. However, these methods struggle when dynamic objects violat…
View article: Beyond Complete Shapes: A Benchmark for Quantitative Evaluation of 3D Shape Surface Matching Algorithms
Beyond Complete Shapes: A Benchmark for Quantitative Evaluation of 3D Shape Surface Matching Algorithms Open
Finding correspondences between 3D deformable shapes is an important and long‐standing problem in geometry processing, computer vision, graphics, and beyond. While various shape matching datasets exist, they are mostly static or limited in…
View article: The Monado SLAM Dataset for Egocentric Visual-Inertial Tracking
The Monado SLAM Dataset for Egocentric Visual-Inertial Tracking Open
Humanoid robots and mixed reality headsets benefit from the use of head-mounted sensors for tracking. While advancements in visual-inertial odometry (VIO) and simultaneous localization and mapping (SLAM) have produced new and high-quality …
View article: Advancing Robustness in Deep Reinforcement Learning with an Ensemble Defense Approach
Advancing Robustness in Deep Reinforcement Learning with an Ensemble Defense Approach Open
Recent advancements in Deep Reinforcement Learning (DRL) have demonstrated its applicability across various domains, including robotics, healthcare, energy optimization, and autonomous driving. However, a critical question remains: How rob…
View article: True Multimodal In-Context Learning Needs Attention to the Visual Context
True Multimodal In-Context Learning Needs Attention to the Visual Context Open
Multimodal Large Language Models (MLLMs), built on powerful language backbones, have enabled Multimodal In-Context Learning (MICL)-adapting to new tasks from a few multimodal demonstrations consisting of images, questions, and answers. Des…
View article: FacaDiffy: Inpainting unseen facade parts using diffusion models
FacaDiffy: Inpainting unseen facade parts using diffusion models Open
High-detail semantic 3D building models are frequently utilized in robotics, geoinformatics, and computer vision. One key aspect of creating such models is employing 2D conflict maps that detect openings’ locations in building facades. Yet…
View article: Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion Open
Semantic scene completion (SSC) aims to infer both the 3D geometry and semantics of a scene from single images. In contrast to prior work on SSC that heavily relies on expensive ground-truth annotations, we approach SSC in an unsupervised …
View article: IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals
IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals Open
Semantic Scene Completion (SSC) has emerged as a pivotal approach for jointly learning scene geometry and semantics, enabling downstream applications such as navigation in mobile robotics. The recent generalization to Panoptic Scene Comple…
View article: Highly Accurate and Diverse Traffic Data: The DeepScenario Open 3D Dataset
Highly Accurate and Diverse Traffic Data: The DeepScenario Open 3D Dataset Open
Accurate 3D trajectory data is crucial for advancing autonomous driving. Yet, traditional datasets are usually captured by fixed sensors mounted on a car and are susceptible to occlusion. Additionally, such an approach can precisely recons…
View article: Shape Your Ground: Refining Road Surfaces Beyond Planar Representations
Shape Your Ground: Refining Road Surfaces Beyond Planar Representations Open
Road surface reconstruction from aerial images is fundamental for autonomous driving, urban planning, and virtual simulation, where smoothness, compactness, and accuracy are critical quality factors. Existing reconstruction methods often p…
View article: PRaDA: Projective Radial Distortion Averaging
PRaDA: Projective Radial Distortion Averaging Open
We tackle the problem of automatic calibration of radially distorted cameras in challenging conditions. Accurately determining distortion parameters typically requires either 1) solving the full Structure from Motion (SfM) problem involvin…
View article: Scene-Centric Unsupervised Panoptic Segmentation
Scene-Centric Unsupervised Panoptic Segmentation Open
Unsupervised panoptic segmentation aims to partition an image into semantically meaningful regions and distinct object instances without training on manually annotated data. In contrast to prior work on unsupervised panoptic scene understa…
View article: ProBA: Probabilistic Bundle Adjustment with the Bhattacharyya Coefficient
ProBA: Probabilistic Bundle Adjustment with the Bhattacharyya Coefficient Open
Classical Bundle Adjustment (BA) methods require accurate initial estimates for convergence and typically assume known camera intrinsics, which limits their applicability when such information is uncertain or unavailable. We propose a nove…
View article: Learned Free-Energy Functionals from Pair-Correlation Matching for Dynamical Density Functional Theory
Learned Free-Energy Functionals from Pair-Correlation Matching for Dynamical Density Functional Theory Open
Classical density functional theory (cDFT) and dynamical density functional theory (DDFT) are modern statistical mechanical theories for modeling many-body colloidal systems at the one-body density level. The theories hinge on knowing the …
View article: OPAL: Visibility-aware LiDAR-to-OpenStreetMap Place Recognition via Adaptive Radial Fusion
OPAL: Visibility-aware LiDAR-to-OpenStreetMap Place Recognition via Adaptive Radial Fusion Open
LiDAR place recognition is a critical capability for autonomous navigation and cross-modal localization in large-scale outdoor environments. Existing approaches predominantly depend on pre-built 3D dense maps or aerial imagery, which impos…
View article: TwoSquared: 4D Generation from 2D Image Pairs
TwoSquared: 4D Generation from 2D Image Pairs Open
Despite the astonishing progress in generative AI, 4D dynamic object generation remains an open challenge. With limited high-quality training data and heavy computing requirements, the combination of hallucinating unseen geometry together …
View article: RADLER: Radar Object Detection Leveraging Semantic 3D City Models and Self-Supervised Radar-Image Learning
RADLER: Radar Object Detection Leveraging Semantic 3D City Models and Self-Supervised Radar-Image Learning Open
Semantic 3D city models are worldwide easy-accessible, providing accurate, object-oriented, and semantic-rich 3D priors. To date, their potential to mitigate the noise impact on radar object detection remains under-explored. In this paper,…
View article: PRISM: Probabilistic Representation for Integrated Shape Modeling and Generation
PRISM: Probabilistic Representation for Integrated Shape Modeling and Generation Open
Despite the advancements in 3D full-shape generation, accurately modeling complex geometries and semantics of shape parts remains a significant challenge, particularly for shapes with varying numbers of parts. Current methods struggle to e…