Esa Rahtu
YOU?
Author Swipe
View article: Evaluating Fisheye-Compatible 3D Gaussian Splatting Methods on Real Images Beyond 180 Degree Field of View
Evaluating Fisheye-Compatible 3D Gaussian Splatting Methods on Real Images Beyond 180 Degree Field of View Open
We present the first evaluation of fisheye-based 3D Gaussian Splatting methods, Fisheye-GS and 3DGUT, on real images with fields of view exceeding 180 degree. Our study covers both indoor and outdoor scenes captured with 200 degree fisheye…
View article: Hall_In Indoor Scene for FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking
Hall_In Indoor Scene for FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking Open
The fifth indoor scene for FIORD dataset.
View article: AGS-Mesh: Adaptive Gaussian Splatting and Meshing with Geometric Priors for Indoor Room Reconstruction Using Smartphones
AGS-Mesh: Adaptive Gaussian Splatting and Meshing with Geometric Priors for Indoor Room Reconstruction Using Smartphones Open
Geometric priors are often used to enhance 3D reconstruction. With many smartphones featuring low-resolution depth sensors and the prevalence of off-the-shelf monocular geometry estimators, incorporating geometric priors as regularization …
View article: L2C -- Learning to Learn to Compress
L2C -- Learning to Learn to Compress Open
In this paper we present an end-to-end meta-learned system for image compression. Traditional machine learning based approaches to image compression train one or more neural network for generalization performance. However, at inference tim…
View article: Temporally Aligned Audio for Video with Autoregression
Temporally Aligned Audio for Video with Autoregression Open
We introduce V-AURA, the first autoregressive model to achieve high temporal alignment and relevance in video-to-audio generation. V-AURA uses a high-framerate visual feature extractor and a cross-modal audio-visual feature fusion strategy…
View article: UDGS-SLAM : UniDepth Assisted Gaussian Splatting for Monocular SLAM
UDGS-SLAM : UniDepth Assisted Gaussian Splatting for Monocular SLAM Open
Recent advancements in monocular neural depth estimation, particularly those achieved by the UniDepth network, have prompted the investigation of integrating UniDepth within a Gaussian splatting framework for monocular SLAM. This study pre…
View article: DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing
DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing Open
High-fidelity 3D reconstruction of common indoor scenes is crucial for VR and AR applications. 3D Gaussian splatting, a novel differentiable rendering technique, has achieved state-of-the-art novel view synthesis results with high renderin…
View article: Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion
Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion Open
High-quality scene reconstruction and novel view synthesis based on Gaussian Splatting (3DGS) typically require steady, high-quality photographs, often impractical to capture with handheld cameras. We present a method that adapts to camera…
View article: GS-Pose: Generalizable Segmentation-based 6D Object Pose Estimation with 3D Gaussian Splatting
GS-Pose: Generalizable Segmentation-based 6D Object Pose Estimation with 3D Gaussian Splatting Open
This paper introduces GS-Pose, a unified framework for localizing and estimating the 6D pose of novel objects. GS-Pose begins with a set of posed RGB images of a previously unseen object and builds three distinct representations stored in …
View article: Synchformer: Efficient Synchronization from Sparse Cues
Synchformer: Efficient Synchronization from Sparse Cues Open
Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse. Our contributions include a novel audio-visual synchronization model, and training that…
View article: NN-VVC: Versatile Video Coding boosted by self-supervisedly learned image coding for machines
NN-VVC: Versatile Video Coding boosted by self-supervisedly learned image coding for machines Open
The recent progress in artificial intelligence has led to an ever-increasing usage of images and videos by machine analysis algorithms, mainly neural networks. Nonetheless, compression, storage and transmission of media have traditionally …
View article: Cascaded and Generalizable Neural Radiance Fields for Fast View Synthesis
Cascaded and Generalizable Neural Radiance Fields for Fast View Synthesis Open
We present CG-NeRF, a cascade and generalizable neural radiance fields method for view synthesis. Recent generalizing view synthesis methods can render high-quality novel views using a set of nearby input views. However, the rendering spee…
View article: MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis (iPhone Part 3)
MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis (iPhone Part 3) Open
Metaverse technologies demand accurate, real-time, and immersive modeling on consumer-grade hardware for both non-human perception (e.g., drone/robot/autonomous car navigation) and immersive technologies like AR/VR, requiring both structur…
View article: Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation Open
Human evaluation is critical for validating the performance of text-to-image generative models, as this highly cognitive process requires deep comprehension of text and images. However, our survey of 37 recent papers reveals that many work…
View article: FinnWoodlands Dataset
FinnWoodlands Dataset Open
While the availability of large and diverse datasets has contributed to significant breakthroughs in autonomous driving and indoor applications, forestry applications are still lagging behind and new forest datasets would most certainly co…
View article: MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation
MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation Open
Acquiring labeled 6D poses from real images is an expensive and time-consuming task. Though massive amounts of synthetic RGB images are easy to obtain, the models trained on them suffer from noticeable performance degradation due to the sy…
View article: BS3D: Building-scale 3D Reconstruction from RGB-D Images
BS3D: Building-scale 3D Reconstruction from RGB-D Images Open
Various datasets have been proposed for simultaneous localization and mapping (SLAM) and related problems. Existing datasets often include small environments, have incomplete ground truth, or lack important sensor data, such as depth and i…
View article: PanDepth: Joint Panoptic Segmentation and Depth Completion
PanDepth: Joint Panoptic Segmentation and Depth Completion Open
Understanding 3D environments semantically is pivotal in autonomous driving applications where multiple computer vision tasks are involved. Multi-task models provide different types of outputs for a given scene, yielding a more holistic re…
View article: Bridging the Gap Between Image Coding for Machines and Humans
Bridging the Gap Between Image Coding for Machines and Humans Open
Image coding for machines (ICM) aims at reducing the bitrate required to represent an image while minimizing the drop in machine vision analysis accuracy. In many use cases, such as surveillance, it is also important that the visual qualit…
View article: Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors
Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors Open
The objective of this paper is audio-visual synchronisation of general videos 'in the wild'. For such videos, the events that may be harnessed for synchronisation cues may be spatially small and may occur only infrequently during a many se…