Explanipedia

UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction Open

Xiangyong Cao, Hongrui Wu, Ziyong Feng, Hujun Bao, Xiaowei Zhou , et al. · 2025

This paper tackles the challenge of robust reconstruction, i.e., the task of reconstructing a 3D scene from a set of inconsistent multi-view images. Some recent works have attempted to simultaneously remove image inconsistencies and perfor…

Precise Action-to-Video Generation Through Visual Action Prompts Open

Yuang Wang, Chao Wen, Haoyu Guo, Sida Peng, Minghan Qin , et al. · 2025

We present visual action prompts, a unified action representation for action-to-video generation of complex high-DoF interactions while maintaining transferable visual dynamics across domains. Action-driven video generation faces a precisi…

Efficient workflow scheduling using an improved multi-objective memetic algorithm in cloud-edge-end collaborative framework Open

Guangzhang Cui, Wei Zhang, Weiwei Xu, Hujun Bao · 2025

With the rapid advancement of large-scale model technologies, AI agent frameworks built on foundation models have become a central focus of artificial-intelligence research. In cloud-edge-end collaborative computing frameworks, efficient w…

HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers Open

Zhiyuan Yu, Zhe Li, Hujun Bao, Can Yang, Xiaowei Zhou · 2025

Computer science Engineering

3D human reconstruction and animation are long-standing topics in computer graphics and vision. However, existing methods typically rely on sophisticated dense-view capture and/or time-consuming per-subject optimization procedures. To addr…

SpatialTrackerV2: 3D Point Tracking Made Easy Open

Yuxi Xiao, Jianyuan Wang, Nan Xue, Nikita Karaev, Yuri Makarov , et al. · 2025

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos. Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point track…

InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes Open

Zesong Yang, Bangbang Yang, Wenqi Dong, Caroline G. L. Cao, Liyuan Cui , et al. · 2025

Humans can naturally identify and mentally complete occluded objects in cluttered environments. However, imparting similar cognitive ability to robotics remains challenging even with advanced reconstruction techniques, which models scenes …

Regulation-aware freeform headlamp reflector design with differentiable ray tracing Open

Xuchen Wei, Yuchi Huo, Pengfei Shen, Yifan Peng, Hujun Bao , et al. · 2025

Computer science Physics

Headlamp design is an essential aspect of the automotive industry, often relying on reflector systems composed of freeform surfaces. Traditional methods depend on manually adjusting surfaces to ensure reflected rays meet regulation require…

FreeTimeGS: Free Gaussian Primitives at Anytime and Anywhere for Dynamic Scene Reconstruction Open

Yifan Wang, Peng Yang, Zhen Xu, Jiaming Sun, Zhanhua Zhang , et al. · 2025

This paper addresses the challenge of reconstructing dynamic 3D scenes with complex motions. Some recent works define 3D Gaussian primitives in the canonical space and use deformation fields to map canonical primitives to observation space…

SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations Open

Songchun Zhang, Huiyao Xu, Guo Sitong, Zhongwei Xie, Hujun Bao , et al. · 2025

Novel view synthesis (NVS) boosts immersive experiences in computer vision and graphics. Existing techniques, though progressed, rely on dense multi-view observations, restricting their application. This work takes on the challenge of reco…

HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation Open

Wenqi Dong, Bangbang Yang, Zesong Yang, Yuan Li, Tao Hu , et al. · 2025

Scene-level 3D generation represents a critical frontier in multimedia and computer graphics, yet existing approaches either suffer from limited object categories or lack editing flexibility for interactive applications. In this paper, we …

GURecon: Learning Detailed 3D Geometric Uncertainties for Neural Surface Reconstruction Open

Zesong Yang, Ru Zhang, Jiale Shi, Zhi Yong Ai, Boming Zhao , et al. · 2025

Computer science Geology Mathematics

Neural surface representation has demonstrated remarkable success in the areas of novel view synthesis and 3D reconstruction. However, assessing the geometric quality of 3D reconstructions in the absence of ground truth mesh remains a sign…

SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion Open

Xiyue Guo, Jiarui Hu, Junjie Hu, Hujun Bao, Guofeng Zhang · 2025

Recently, camera-based solutions have been extensively explored for scene semantic completion (SSC). Despite their success in visible areas, existing methods struggle to capture complete scene semantics due to frequent visual occlusions. T…

Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation Open

Zhi Cen, Huaijin Pi, Sida Peng, Qing Shuai, Yujun Shen , et al. · 2025

This paper addresses the task of generating two-character online interactions. Previously, two main settings existed for two-character interaction generation: (1) generating one's motions based on the counterpart's complete motion sequence…

EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds Open

Lu Chen, Yizhou Wang, Shixiang Tang, Qi Ma, Tong He , et al. · 2025

Computer science Engineering

Learning an agent model that behaves like humans-capable of jointly perceiving the environment, predicting the future, and taking actions from a first-person perspective-is a fundamental challenge in computer vision. Existing methods typic…

XR-VIO: High-precision Visual Inertial Odometry with Fast Initialization for XR Applications Open

Shangjin Zhai, Nan Wang, Xiaomeng Wang, Danpeng Chen, Weijian Xie , et al. · 2025

Computer science Physics

This paper presents a novel approach to Visual Inertial Odometry (VIO), focusing on the initialization and feature matching modules. Existing methods for initialization often suffer from either poor stability in visual Structure from Motio…

MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation Open

Heyao Liu, Wenzhang Sun, Donglin Di, Shibo Sun, Jiahui Yang , et al. · 2025

Computer science Art

The generation of talking avatars has achieved significant advancements in precise audio synchronization. However, crafting lifelike talking head videos requires capturing a broad spectrum of emotions and subtle facial expressions. Current…

EnvGS: Modeling View-Dependent Appearance with Environment Gaussian Open

Tao Xie, Xi Chen, Zhen Xu, Yan Xie, Yang Jin , et al. · 2024

Computer science Mathematics Physics

Reconstructing complex reflections in real-world scenes from 2D images is essential for achieving photorealistic novel view synthesis. Existing methods that utilize environment maps to model reflections from distant lighting often struggle…

GURecon: Learning Detailed 3D Geometric Uncertainties for Neural Surface Reconstruction Open

Zesong Yang, Ru Zhang, Jiale Shi, Zhi Yong Ai, Boming Zhao , et al. · 2024

Computer science Mathematics

Neural surface representation has demonstrated remarkable success in the areas of novel view synthesis and 3D reconstruction. However, assessing the geometric quality of 3D reconstructions in the absence of ground truth mesh remains a sign…

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models Open

Yunzhi Yan, Zhen Xu, Haotong Lin, Haian Jin, Haoyu Guo , et al. · 2024

Computer science Business Physics

This paper aims to tackle the problem of photorealistic view synthesis from vehicle sensor data. Recent advancements in neural scene representation have achieved notable success in rendering high-quality autonomous driving scenes, but the …

World-Grounded Human Motion Recovery via Gravity-View Coordinates Open

Zehong Shen, Huaijin Pi, Yan Xia, Zhi Cen, Sida Peng , et al. · 2024

Computer science Mathematics

We present a novel method for recovering world-grounded human motion from monocular video. The main challenge lies in the ambiguity of defining the world coordinate system, which varies between sequences. Previous approaches attempt to all…

PC-Planner: Physics-Constrained Self-Supervised Learning for Robust Neural Motion Planning with Shape-Aware Distance Function Open

Xujie Shen, Haocheng Peng, Zesong Yang, Juzhan Xu, Hujun Bao , et al. · 2024

Computer science Mathematics Chemistry

Motion Planning (MP) is a critical challenge in robotics, especially\npertinent with the burgeoning interest in embodied artificial intelligence.\nTraditional MP methods often struggle with high-dimensional complexities.\nRecently neural m…

DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild Open

Weicai Ye, Xinyu Chen, Ronghui Zhan, Di Huang, Xiaoshui Huang , et al. · 2024

Computer science Geography Mathematics

This paper proposes a concise, elegant, and robust pipeline to estimate smooth camera trajectories and obtain dense point clouds for casual videos in the wild. Traditional frameworks, such as ParticleSfM~\cite{zhao2022particlesfm}, address…

Representing Long Volumetric Video with Temporal Gaussian Hierarchy Open

Zhen Xu, Yinghao Xu, Zhiyuan Yu, Sida Peng, Jiaming Sun , et al. · 2024

Computer science Physics Economics

This paper aims to address the challenge of reconstructing long volumetric videos from multi-view RGB videos. Recent dynamic view synthesis methods leverage powerful 4D representations, like feature grids or point cloud sequences, to achie…

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding Open

Yitong Dong, Yijin Li, Zhaoyang Huang, Weikang Bian, Jingbo Liu , et al. · 2024

Computer science Engineering

In this paper, we propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior. Unlike recent prior-free MVS methods that work in a pair-wise manner, our method simultaneously considers all the source images. Sp…

ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses Open

Junjie Ni, Guofeng Zhang, Guanglin Li, Yijin Li, Xinyang Liu , et al. · 2024

Computer science Mathematics Engineering

We tackle the efficiency problem of learning local feature matching. Recent advancements have given rise to purely CNN-based and transformer-based approaches, each augmented with deep learning techniques. While CNN-based methods often exce…

BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events Open

Yijin Li, Yichen Shen, Zhaoyang Huang, Shuo Chen, Weikang Bian , et al. · 2024

Computer science Geography Mathematics

Recent advances in event-based vision suggest that these systems complement traditional cameras by providing continuous observation without frame rate limitations and a high dynamic range, making them well-suited for correspondence tasks s…

Hujun Bao YOU? Author Swipe