Hujun Bao
YOU?
Author Swipe
UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction Open
This paper tackles the challenge of robust reconstruction, i.e., the task of reconstructing a 3D scene from a set of inconsistent multi-view images. Some recent works have attempted to simultaneously remove image inconsistencies and perfor…
Precise Action-to-Video Generation Through Visual Action Prompts Open
We present visual action prompts, a unified action representation for action-to-video generation of complex high-DoF interactions while maintaining transferable visual dynamics across domains. Action-driven video generation faces a precisi…
Efficient workflow scheduling using an improved multi-objective memetic algorithm in cloud-edge-end collaborative framework Open
With the rapid advancement of large-scale model technologies, AI agent frameworks built on foundation models have become a central focus of artificial-intelligence research. In cloud-edge-end collaborative computing frameworks, efficient w…
View article: HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers
HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers Open
3D human reconstruction and animation are long-standing topics in computer graphics and vision. However, existing methods typically rely on sophisticated dense-view capture and/or time-consuming per-subject optimization procedures. To addr…
SpatialTrackerV2: 3D Point Tracking Made Easy Open
We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos. Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point track…
InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes Open
Humans can naturally identify and mentally complete occluded objects in cluttered environments. However, imparting similar cognitive ability to robotics remains challenging even with advanced reconstruction techniques, which models scenes …
View article: Regulation-aware freeform headlamp reflector design with differentiable ray tracing
Regulation-aware freeform headlamp reflector design with differentiable ray tracing Open
Headlamp design is an essential aspect of the automotive industry, often relying on reflector systems composed of freeform surfaces. Traditional methods depend on manually adjusting surfaces to ensure reflected rays meet regulation require…
FreeTimeGS: Free Gaussian Primitives at Anytime and Anywhere for Dynamic Scene Reconstruction Open
This paper addresses the challenge of reconstructing dynamic 3D scenes with complex motions. Some recent works define 3D Gaussian primitives in the canonical space and use deformation fields to map canonical primitives to observation space…
SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations Open
Novel view synthesis (NVS) boosts immersive experiences in computer vision and graphics. Existing techniques, though progressed, rely on dense multi-view observations, restricting their application. This work takes on the challenge of reco…
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation Open
Scene-level 3D generation represents a critical frontier in multimedia and computer graphics, yet existing approaches either suffer from limited object categories or lack editing flexibility for interactive applications. In this paper, we …
View article: GURecon: Learning Detailed 3D Geometric Uncertainties for Neural Surface Reconstruction
GURecon: Learning Detailed 3D Geometric Uncertainties for Neural Surface Reconstruction Open
Neural surface representation has demonstrated remarkable success in the areas of novel view synthesis and 3D reconstruction. However, assessing the geometric quality of 3D reconstructions in the absence of ground truth mesh remains a sign…
View article: SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion
SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion Open
Recently, camera-based solutions have been extensively explored for scene semantic completion (SSC). Despite their success in visible areas, existing methods struggle to capture complete scene semantics due to frequent visual occlusions. T…
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation Open
This paper addresses the task of generating two-character online interactions. Previously, two main settings existed for two-character interaction generation: (1) generating one's motions based on the counterpart's complete motion sequence…
EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds Open
Learning an agent model that behaves like humans-capable of jointly perceiving the environment, predicting the future, and taking actions from a first-person perspective-is a fundamental challenge in computer vision. Existing methods typic…
XR-VIO: High-precision Visual Inertial Odometry with Fast Initialization for XR Applications Open
This paper presents a novel approach to Visual Inertial Odometry (VIO), focusing on the initialization and feature matching modules. Existing methods for initialization often suffer from either poor stability in visual Structure from Motio…
MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation Open
The generation of talking avatars has achieved significant advancements in precise audio synchronization. However, crafting lifelike talking head videos requires capturing a broad spectrum of emotions and subtle facial expressions. Current…
EnvGS: Modeling View-Dependent Appearance with Environment Gaussian Open
Reconstructing complex reflections in real-world scenes from 2D images is essential for achieving photorealistic novel view synthesis. Existing methods that utilize environment maps to model reflections from distant lighting often struggle…
View article: GURecon: Learning Detailed 3D Geometric Uncertainties for Neural Surface Reconstruction
GURecon: Learning Detailed 3D Geometric Uncertainties for Neural Surface Reconstruction Open
Neural surface representation has demonstrated remarkable success in the areas of novel view synthesis and 3D reconstruction. However, assessing the geometric quality of 3D reconstructions in the absence of ground truth mesh remains a sign…
View article: StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models Open
This paper aims to tackle the problem of photorealistic view synthesis from vehicle sensor data. Recent advancements in neural scene representation have achieved notable success in rendering high-quality autonomous driving scenes, but the …
World-Grounded Human Motion Recovery via Gravity-View Coordinates Open
We present a novel method for recovering world-grounded human motion from monocular video. The main challenge lies in the ambiguity of defining the world coordinate system, which varies between sequences. Previous approaches attempt to all…
PC-Planner: Physics-Constrained Self-Supervised Learning for Robust Neural Motion Planning with Shape-Aware Distance Function Open
Motion Planning (MP) is a critical challenge in robotics, especially\npertinent with the burgeoning interest in embodied artificial intelligence.\nTraditional MP methods often struggle with high-dimensional complexities.\nRecently neural m…
DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild Open
This paper proposes a concise, elegant, and robust pipeline to estimate smooth camera trajectories and obtain dense point clouds for casual videos in the wild. Traditional frameworks, such as ParticleSfM~\cite{zhao2022particlesfm}, address…
View article: Representing Long Volumetric Video with Temporal Gaussian Hierarchy
Representing Long Volumetric Video with Temporal Gaussian Hierarchy Open
This paper aims to address the challenge of reconstructing long volumetric videos from multi-view RGB videos. Recent dynamic view synthesis methods leverage powerful 4D representations, like feature grids or point cloud sequences, to achie…
A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding Open
In this paper, we propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior. Unlike recent prior-free MVS methods that work in a pair-wise manner, our method simultaneously considers all the source images. Sp…
ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses Open
We tackle the efficiency problem of learning local feature matching. Recent advancements have given rise to purely CNN-based and transformer-based approaches, each augmented with deep learning techniques. While CNN-based methods often exce…
BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events Open
Recent advances in event-based vision suggest that these systems complement traditional cameras by providing continuous observation without frame rate limitations and a high dynamic range, making them well-suited for correspondence tasks s…