Bastian Leibe
YOU?
Author Swipe
View article: Faster VGGT with Block-Sparse Global Attention
Faster VGGT with Block-Sparse Global Attention Open
Efficient and accurate feed-forward multi-view reconstruction has long been an important task in computer vision. Recent transformer-based models like VGGT and $π^3$ have achieved impressive results with simple architectures, yet they face…
View article: Pretrained Models from "MaskTerial: A Foundation Model for Automated 2D Material Flake Detection"
Pretrained Models from "MaskTerial: A Foundation Model for Automated 2D Material Flake Detection" Open
This repo hosts the pretrained model weights from "MaskTerial: A Foundation Model for Automated 2D Material Flake Detection" The models follow the naming scheme "MODELTYPE_MODELNAME_MATERIAL.zip". The Code for the Model is on GitHub: MaskT…
View article: How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction?
How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction? Open
Gestures enable non-verbal human-robot communication, especially in noisy environments like agile production. Traditional deep learning-based gesture recognition relies on task-specific architectures using images, videos, or skeletal pose …
View article: Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images
Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images Open
Fisheye cameras offer robots the ability to capture human movements across a wider field of view (FOV) than standard pinhole cameras, making them particularly useful for applications in human-robot interaction and automotive contexts. Howe…
View article: Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving
Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving Open
To operate safely, autonomous vehicles (AVs) need to detect and handle unexpected objects or anomalies on the road. While significant research exists for anomaly detection and segmentation in 2D, research progress in 3D is underexplored. E…
View article: Acquisition of high-quality images for camera calibration in robotics applications via speech prompts
Acquisition of high-quality images for camera calibration in robotics applications via speech prompts Open
Accurate intrinsic and extrinsic camera calibration can be an important prerequisite for robotic applications that rely on vision as input. While there is ongoing research on enabling camera calibration using natural images, many systems i…
View article: Panoptic-CUDAL: Rural Australia Point Cloud Dataset in Rainy Conditions
Panoptic-CUDAL: Rural Australia Point Cloud Dataset in Rainy Conditions Open
Existing autonomous driving datasets are predominantly oriented towards well-structured urban settings and favourable weather conditions, leaving the complexities of rural environments and adverse weather conditions largely unaddressed. Al…
View article: OCCUQ: Exploring Efficient Uncertainty Quantification for 3D Occupancy Prediction
OCCUQ: Exploring Efficient Uncertainty Quantification for 3D Occupancy Prediction Open
Autonomous driving has the potential to significantly enhance productivity and provide numerous societal benefits. Ensuring robustness in these safety-critical systems is essential, particularly when vehicles must navigate adverse weather …
View article: Fine-Tuning Image-Conditional Diffusion Models is Easier than you Think
Fine-Tuning Image-Conditional Diffusion Models is Easier than you Think Open
Recent work showed that large diffusion models can be reused as highly precise monocular depth estimators by casting depth estimation as an image-conditional image generation task. While the proposed model achieved state-of-the-art results…
View article: MaskTerial: a foundation model for automated 2D material flake detection
MaskTerial: a foundation model for automated 2D material flake detection Open
MaskTerial is a foundation model for 2D material flake detection that uses synthetic pretraining and uncertainty modeling to enable fast adaptation to new materials with as few as 5–10 images.
View article: Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization
Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization Open
3D Gaussian Splatting has recently emerged as a powerful tool for fast and accurate novel-view synthesis from a set of posed input images. However, like most novel-view synthesis approaches, it relies on accurate camera pose information, l…
View article: Interactive4D: Interactive 4D LiDAR Segmentation
Interactive4D: Interactive 4D LiDAR Segmentation Open
Interactive segmentation has an important role in facilitating the annotation process of future LiDAR datasets. Existing approaches sequentially segment individual objects at each LiDAR scan, repeating the process throughout the entire seq…
View article: Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Open
Recent work showed that large diffusion models can be reused as highly precise monocular depth estimators by casting depth estimation as an image-conditional image generation task. While the proposed model achieved state-of-the-art results…
View article: OoDIS: Anomaly Instance Segmentation and Detection Benchmark
OoDIS: Anomaly Instance Segmentation and Detection Benchmark Open
Safe navigation of self-driving cars and robots requires a precise understanding of their environment. Training data for perception systems cannot cover the wide variety of objects that may appear during deployment. Thus, reliable identifi…
View article: Cyto R-CNN and CytoNuke Dataset: Towards reliable whole-cell segmentation in bright-field histological images
Cyto R-CNN and CytoNuke Dataset: Towards reliable whole-cell segmentation in bright-field histological images Open
View article: Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects Open
We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion g…
View article: Point-VOS: Pointing Up Video Object Segmentation
Point-VOS: Pointing Up Video Object Segmentation Open
Current state-of-the-art Video Object Segmentation (VOS) methods rely on dense per-object mask annotations both during training and testing. This requires time-consuming and costly video annotation mechanisms. We propose a novel Point-VOS …
View article: An Ordinal Regression Framework for a Deep Learning Based Severity Assessment for Chest Radiographs
An Ordinal Regression Framework for a Deep Learning Based Severity Assessment for Chest Radiographs Open
This study investigates the application of ordinal regression methods for categorizing disease severity in chest radiographs. We propose a framework that divides the ordinal regression problem into three parts: a model, a target function, …
View article: ControlRoom3D: Room Generation using Semantic Proxy Rooms
ControlRoom3D: Room Generation using Semantic Proxy Rooms Open
Manually creating 3D environments for AR/VR applications is a complex process requiring expert knowledge in 3D modeling software. Pioneering works facilitate this process by generating room meshes conditioned on textual style descriptions.…
View article: BUSSARD -- Better Understanding Social Situations for Autonomous Robot Decision-Making
BUSSARD -- Better Understanding Social Situations for Autonomous Robot Decision-Making Open
We report on our effort to create a corpus dataset of different social context situations in an office setting for further disciplinary and interdisciplinary research in computer vision, psychology, and human-robot-interaction. For social …
View article: Mask4Former: Mask Transformer for 4D Panoptic Segmentation
Mask4Former: Mask Transformer for 4D Panoptic Segmentation Open
Accurately perceiving and tracking instances over time is essential for the decision-making processes of autonomous agents interacting safely in dynamic environments. With this intention, we propose Mask4Former for the challenging task of …
View article: Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis
Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis Open
We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We follow an analysis-by-synthesis framework, inspired by recent work…
View article: UGainS: Uncertainty Guided Anomaly Instance Segmentation
UGainS: Uncertainty Guided Anomaly Instance Segmentation Open
A single unexpected object on the road can cause an accident or may lead to injuries. To prevent this, we need a reliable mechanism for finding anomalous objects on the road. This task, called anomaly segmentation, can be a stepping stone …
View article: AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation
AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation Open
During interactive segmentation, a model and a user work together to delineate objects of interest in a 3D point cloud. In an iterative process, the model assigns each data point to an object (or the background), while the user corrects er…
View article: DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer
DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer Open
Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth annotations for training, which are expensive to create. Interactive segmentation networks help generate such annotations based on an i…
View article: Point2Vec for Self-Supervised Representation Learning on Point Clouds
Point2Vec for Self-Supervised Representation Learning on Point Clouds Open
Recently, the self-supervised learning framework data2vec has shown inspiring performance for various modalities using a masked student-teacher approach. However, it remains open whether such a framework generalizes to the unique challenge…
View article: TarViS: A Unified Approach for Target-based Video Segmentation
TarViS: A Unified Approach for Target-based Video Segmentation Open
The general domain of video segmentation is currently fragmented into different tasks spanning multiple benchmarks. Despite rapid progress in the state-of-the-art, current methods are overwhelmingly task-specific and cannot conceptually ge…
View article: 3D Segmentation of Humans in Point Clouds with Synthetic Data
3D Segmentation of Humans in Point Clouds with Synthetic Data Open
Segmenting humans in 3D indoor scenes has become increasingly important with the rise of human-centered robotics and AR/VR applications. To this end, we propose the task of joint 3D human semantic segmentation, instance segmentation and mu…
View article: Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats
Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats Open
Deep learning-based 3D human pose estimation performs best when trained on large amounts of labeled data, making combined learning from many datasets an important research direction. One obstacle to this endeavor are the different skeleton…
View article: Pedestrian-Robot Interactions on Autonomous Crowd Navigation: Reactive Control Methods and Evaluation Metrics
Pedestrian-Robot Interactions on Autonomous Crowd Navigation: Reactive Control Methods and Evaluation Metrics Open
Autonomous navigation in highly populated areas remains a challenging task for robots because of the difficulty in guaranteeing safe interactions with pedestrians in unstructured situations. In this work, we present a crowd navigation cont…