Martin R. Oswald
YOU?
Author Swipe
View article: Open-Vocabulary Online Semantic Mapping for SLAM
Open-Vocabulary Online Semantic Mapping for SLAM Open
This paper presents an Open-Vocabulary Online 3D semantic mapping pipeline, that we denote by its acronym OVO. Given a sequence of posed RGB-D frames, we detect and track 3D segments, which we describe using CLIP vectors. These are compute…
View article: Visual Odometry with Transformers
Visual Odometry with Transformers Open
Despite the rapid development of large 3D models, classical optimization-based approaches dominate the field of visual odometry (VO). Thus, current approaches to VO heavily rely on camera parameters and many handcrafted components, most of…
View article: ProDyG: Progressive Dynamic Scene Reconstruction via Gaussian Splatting from Monocular Videos
ProDyG: Progressive Dynamic Scene Reconstruction via Gaussian Splatting from Monocular Videos Open
Achieving truly practical dynamic 3D reconstruction requires online operation, global pose and map consistency, detailed appearance modeling, and the flexibility to handle both RGB and RGB-D inputs. However, existing SLAM methods typically…
View article: Physics-based Human Pose Estimation from a Single Moving RGB Camera
Physics-based Human Pose Estimation from a Single Moving RGB Camera Open
Most monocular and physics-based human pose tracking methods, while achieving state-of-the-art results, suffer from artifacts when the scene does not have a strictly flat ground plane or when the camera is moving. Moreover, these methods a…
View article: SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting
SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting Open
3D Gaussian Splatting (3DGS) serves as a highly performant and efficient encoding of scene geometry, appearance, and semantics. Moreover, grounding language in 3D scenes has proven to be an effective strategy for 3D scene understanding. Cu…
View article: ToF-Splatting: Dense SLAM using Sparse Time-of-Flight Depth and Multi-Frame Integration
ToF-Splatting: Dense SLAM using Sparse Time-of-Flight Depth and Multi-Frame Integration Open
Time-of-Flight (ToF) sensors provide efficient active depth sensing at relatively low power budgets; among such designs, only very sparse measurements from low-resolution sensors are considered to meet the increasingly limited power constr…
View article: 3D Gaussian Inverse Rendering with Approximated Global Illumination
3D Gaussian Inverse Rendering with Approximated Global Illumination Open
3D Gaussian Splatting shows great potential in reconstructing photo-realistic 3D scenes. However, these methods typically bake illumination into their representations, limiting their use for physically-based rendering and scene editing. Al…
View article: WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation
WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation Open
We present WorldPose, a novel dataset for advancing research in multi-person global pose estimation in the wild, featuring footage from the 2022 FIFA World Cup. While previous datasets have primarily focused on local poses, often limited t…
View article: MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM
MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM Open
Simultaneous localization and mapping (SLAM) systems with novel view synthesis capabilities are widely used in computer vision, with applications in augmented reality, robotics, and autonomous driving. However, existing approaches are limi…
View article: Learning High-level Semantic-Relational Concepts for SLAM
Learning High-level Semantic-Relational Concepts for SLAM Open
peer reviewed
View article: TWIST & SCOUT: Grounding Multimodal LLM-Experts by Forget-Free Tuning
TWIST & SCOUT: Grounding Multimodal LLM-Experts by Forget-Free Tuning Open
Spatial awareness is key to enable embodied multimodal AI systems. Yet, without vast amounts of spatial supervision, current Multimodal Large Language Models (MLLMs) struggle at this task. In this paper, we introduce TWIST & SCOUT, a frame…
View article: An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Open
This work does not introduce a new method. Instead, we present an interesting finding that questions the necessity of the inductive bias of locality in modern computer vision architectures. Concretely, we find that vanilla Transformers can…
View article: 3D-AVS: LiDAR-based 3D Auto-Vocabulary Segmentation
3D-AVS: LiDAR-based 3D Auto-Vocabulary Segmentation Open
Open-Vocabulary Segmentation (OVS) methods offer promising capabilities in detecting unseen object categories, but the category must be known and needs to be provided by a human, either via a text prompt or pre-labeled datasets, thus limit…
View article: Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians
Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians Open
3D Gaussian Splatting has emerged as a powerful representation of geometry and appearance for RGB-only dense Simultaneous Localization and Mapping (SLAM), as it provides a compact dense map representation while enabling efficient and high-…
View article: GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM
GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM Open
Recent advancements in RGB-only dense Simultaneous Localization and Mapping (SLAM) have predominantly utilized grid-based neural implicit encodings and/or struggle to efficiently realize global map and pose consistency. To this end, we pro…
View article: How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey
How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey Open
Over the past two decades, research in the field of Simultaneous Localization and Mapping (SLAM) has undergone a significant evolution, highlighting its critical role in enabling autonomous exploration of unknown environments. This evoluti…
View article: Loopy-SLAM: Dense Neural SLAM with Loop Closures
Loopy-SLAM: Dense Neural SLAM with Loop Closures Open
Neural RGBD SLAM techniques have shown promise in dense Simultaneous Localization And Mapping (SLAM), yet face challenges such as error accumulation during camera tracking resulting in distorted maps. In response, we introduce Loopy-SLAM t…
View article: Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion
Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion Open
Directly generating scenes from satellite imagery offers exciting possibilities for integration into applications like games and map services. However, challenges arise from significant view changes and scene scale. Previous efforts mainly…
View article: NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation
NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation Open
The capabilities of monocular depth estimation (MDE) models are limited by the availability of sufficient and diverse datasets. In the case of MDE models for autonomous driving, this issue is exacerbated by the linearity of the captured da…
View article: Rapid Optical Cytology with Deep Learning-Based Cell Segmentation for Diagnosis of Thyroid Lesions
Rapid Optical Cytology with Deep Learning-Based Cell Segmentation for Diagnosis of Thyroid Lesions Open
We have developed and implemented a rapid, robust, and clinically viable protocol for fluorescence polarization cytopathology of thyroid nodules. The proposed approach utilizes rapid sample preparation and automated image analysis to accur…
View article: T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning
T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning Open
The scarcity of annotated data in LiDAR point cloud understanding hinders effective representation learning. Consequently, scholars have been actively investigating efficacious self-supervised pre-training paradigms. Nevertheless, temporal…
View article: Auto-Vocabulary Semantic Segmentation
Auto-Vocabulary Semantic Segmentation Open
Open-Vocabulary Segmentation (OVS) methods are capable of performing semantic segmentation without relying on a fixed vocabulary, and in some cases, without training or fine-tuning. However, OVS methods typically require a human in the loo…
View article: Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting Open
We present a dense simultaneous localization and mapping (SLAM) method that uses 3D Gaussians as a scene representation. Our approach enables interactive-time reconstruction and photo-realistic rendering from real-world single-camera RGBD …
View article: Union-over-Intersections: Object Detection beyond Winner-Takes-All
Union-over-Intersections: Object Detection beyond Winner-Takes-All Open
This paper revisits the problem of predicting box locations in object detection architectures. Typically, each box proposal or box query aims to directly maximize the intersection-over-union score with the ground truth, followed by a winne…
View article: ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction
ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction Open
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames. Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robo…
View article: Automatic registration with continuous pose updates for marker-less surgical navigation in spine surgery
Automatic registration with continuous pose updates for marker-less surgical navigation in spine surgery Open
Established surgical navigation systems for pedicle screw placement have been proven to be accurate, but still reveal limitations in registration or surgical guidance. Registration of preoperative data to the intraoperative anatomy remains…
View article: Cross-View Outdoor Localization in Augmented Reality by Fusing Map and Satellite Data
Cross-View Outdoor Localization in Augmented Reality by Fusing Map and Satellite Data Open
Visual positioning is the task of finding the location of a given image and is necessary for augmented reality applications. Traditional algorithms solve this problem by matching against premade 3D point clouds or panoramic images. Recentl…
View article: Relational Prior Knowledge Graphs for Detection and Instance Segmentation
Relational Prior Knowledge Graphs for Detection and Instance Segmentation Open
Humans have a remarkable ability to perceive and reason about the world around them by understanding the relationships between objects. In this paper, we investigate the effectiveness of using such relationships for object detection and in…