Kwan-Yee Lin
YOU?
Author Swipe
View article: Let Humanoids Hike! Integrative Skill Development on Complex Trails
Let Humanoids Hike! Integrative Skill Development on Complex Trails Open
Hiking on complex trails demands balance, agility, and adaptive decision-making over unpredictable terrain. Current humanoid research remains fragmented and inadequate for hiking: locomotion focuses on motor skills without long-term goals …
View article: TimeWalker: Personalized Neural Space for Lifelong Head Avatars
TimeWalker: Personalized Neural Space for Lifelong Head Avatars Open
We present TimeWalker, a novel framework that models realistic, full-scale 3D head avatars of a person on lifelong scale. Unlike current human head avatar pipelines that capture identity at the momentary level(e.g., instant photography or …
View article: Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior
Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior Open
Text-to-3D generation has achieved remarkable success via large-scale text-to-image diffusion models. Nevertheless, there is no paradigm for scaling up the methodology to urban scale. Urban scenes, characterized by numerous elements, intri…
View article: CosmicMan: A Text-to-Image Foundation Model for Humans
CosmicMan: A Text-to-Image Foundation Model for Humans Open
We present CosmicMan, a text-to-image foundation model specialized for generating high-fidelity human images. Unlike current general-purpose foundation models that are stuck in the dilemma of inferior quality and text-image misalignment fo…
View article: RNNPose: 6-DoF Object Pose Estimation via Recurrent Correspondence Field Estimation and Pose Optimization
RNNPose: 6-DoF Object Pose Estimation via Recurrent Correspondence Field Estimation and Pose Optimization Open
6-DoF object pose estimation from a monocular image is a challenging problem, where a post-refinement procedure is generally needed for high-precision estimation. In this paper, we propose a framework, dubbed RNNPose, based on a recurrent …
View article: Parameterization-driven Neural Surface Reconstruction for Object-oriented Editing in Neural Rendering
Parameterization-driven Neural Surface Reconstruction for Object-oriented Editing in Neural Rendering Open
The advancements in neural rendering have increased the need for techniques that enable intuitive editing of 3D objects represented as neural implicit surfaces. This paper introduces a novel neural algorithm for parameterizing neural impli…
View article: UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation
UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation Open
Human generation has achieved significant progress. Nonetheless, existing methods still struggle to synthesize specific regions such as faces and hands. We argue that the main reason is rooted in the training data. A holistic human dataset…
View article: Urban Radiance Field Representation with Deformable Neural Mesh Primitives
Urban Radiance Field Representation with Deformable Neural Mesh Primitives Open
Neural Radiance Fields (NeRFs) have achieved great success in the past few years. However, most current methods still require intensive resources due to ray marching-based rendering. To construct urban-level radiance fields efficiently, we…
View article: DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering
DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering Open
Realistic human-centric rendering plays a key role in both computer vision and computer graphics. Rapid progress has been made in the algorithm aspect over the years, yet existing human-centric rendering datasets and benchmarks are rather …
View article: RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars Open
Synthesizing high-fidelity head avatars is a central problem for computer vision and graphics. While head avatar synthesis algorithms have advanced rapidly, the best ones still face great obstacles in real-world scenarios. One of the vital…
View article: MonoHuman: Animatable Human Neural Field from Monocular Video
MonoHuman: Animatable Human Neural Field from Monocular Video Open
Animating virtual avatars with free-view control is crucial for various applications like virtual reality and digital entertainment. Previous studies have attempted to utilize the representation power of the neural radiance field (NeRF) to…
View article: Deformable Model-Driven Neural Rendering for High-Fidelity 3D Reconstruction of Human Heads Under Low-View Settings
Deformable Model-Driven Neural Rendering for High-Fidelity 3D Reconstruction of Human Heads Under Low-View Settings Open
Reconstructing 3D human heads in low-view settings presents technical challenges, mainly due to the pronounced risk of overfitting with limited views and high-frequency signals. To address this, we propose geometry decomposition and adopt …
View article: StyleGAN-Human: A Data-Centric Odyssey of Human Generation
StyleGAN-Human: A Data-Centric Odyssey of Human Generation Open
Unconditional human image generation is an important task in vision and graphics, which enables various applications in the creative industry. Existing studies in this field mainly focus on "network engineering" such as designing new compo…
View article: Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis
Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis Open
This work targets at using a general deep learning framework to synthesize free-viewpoint images of arbitrary human performers, only requiring a sparse number of camera views as inputs and skirting per-case fine-tuning. The large variation…
View article: Simulating Fluids in Real-World Still Images
Simulating Fluids in Real-World Still Images Open
In this work, we tackle the problem of real-world fluid animation from a still image. The key of our system is a surface-based layered representation deriving from video decomposition, where the scene is decoupled into a surface fluid laye…
View article: Learning a Structured Latent Space for Unsupervised Point Cloud Completion
Learning a Structured Latent Space for Unsupervised Point Cloud Completion Open
Unsupervised point cloud completion aims at estimating the corresponding complete point cloud of a partial point cloud in an unpaired manner. It is a crucial but challenging problem since there is no paired partial-complete supervision tha…
View article: RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization
RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization Open
6-DoF object pose estimation from a monocular image is challenging, and a post-refinement procedure is generally needed for high-precision estimation. In this paper, we propose a framework based on a recurrent neural network (RNN) for obje…
View article: Inverting Generative Adversarial Renderer for Face Reconstruction
Inverting Generative Adversarial Renderer for Face Reconstruction Open
Given a monocular face image as input, 3D face geometry reconstruction aims to recover a corresponding 3D face mesh. Recently, both optimization-based and learning-based face reconstruction methods have taken advantage of the emerging diff…
View article: Semantic Scene Completion via Integrating Instances and Scene in-the-Loop
Semantic Scene Completion via Integrating Instances and Scene in-the-Loop Open
Semantic Scene Completion aims at reconstructing a complete 3D scene with precise voxel-wise semantics from a single-view depth or RGBD image. It is a crucial but challenging problem for indoor scene understanding. In this work, we present…
View article: SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural Networks
SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural Networks Open
Recent learning-based LiDAR odometry methods have demonstrated their competitiveness. However, most methods still face two substantial challenges: 1) the 2D projection representation of LiDAR data cannot effectively encode 3D structures fr…
View article: Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation
Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation Open
Depth information has proven to be a useful cue in the semantic segmentation of RGB-D images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-a…
View article: 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior
3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior Open
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation. Since the computation…
View article: TRB: A Novel Triplet Representation for Understanding 2D Human Body
TRB: A Novel Triplet Representation for Understanding 2D Human Body Open
Human pose and shape are two important components of 2D human body. However, how to efficiently represent both of them in images is still an open question. In this paper, we propose the Triplet Representation for Body (TRB) -- a compact 2D…
View article: Make a Face: Towards Arbitrary High Fidelity Face Manipulation
Make a Face: Towards Arbitrary High Fidelity Face Manipulation Open
Recent studies have shown remarkable success in face manipulation task with the advance of GANs and VAEs paradigms, but the outputs are sometimes limited to low-resolution and lack of diversity. In this work, we propose Additive Focal Vari…
View article: Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation
Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation Open
Recent studies have shown remarkable advances in 3D human pose estimation from monocular images, with the help of large-scale in-door 3D datasets and sophisticated network architectures. However, the generalizability to different environme…
View article: Hallucinated-IQA: No-Reference Image Quality Assessment via Adversarial Learning
Hallucinated-IQA: No-Reference Image Quality Assessment via Adversarial Learning Open
No-reference image quality assessment (NR-IQA) is a fundamental yet challenging task in low-level computer vision community. The difficulty is particularly pronounced for the limited information, for which the corresponding reference for c…