Guocheng Qian
YOU?
Author Swipe
View article: ComposeMe: Attribute-Specific Image Prompts for Controllable Human Image Generation
ComposeMe: Attribute-Specific Image Prompts for Controllable Human Image Generation Open
Generating high-fidelity images of humans with fine-grained control over attributes such as hairstyle and clothing remains a core challenge in personalized text-to-image synthesis. While prior methods emphasize identity preservation from a…
View article: I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Open
This paper presents ThinkDiff, a novel alignment paradigm that empowers text-to-image diffusion models with multimodal in-context understanding and reasoning capabilities by integrating the strengths of vision-language models (VLMs). Exist…
View article: Wonderland: Navigating 3D Scenes from a Single Image
Wonderland: Navigating 3D Scenes from a Single Image Open
How can one efficiently generate high-quality, wide-scope 3D scenes from arbitrary single images? Existing methods suffer several drawbacks, such as requiring multi-view data, time-consuming per-scene optimization, distorted geometry in oc…
View article: Omni-ID: Holistic Identity Representation Designed for Generative Tasks
Omni-ID: Holistic Identity Representation Designed for Generative Tasks Open
We introduce Omni-ID, a novel facial representation designed specifically for generative tasks. Omni-ID encodes holistic information about an individual's appearance across diverse expressions and poses within a fixed-size representation. …
View article: AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers Open
Numerous works have recently integrated 3D camera control into foundational text-to-video models, but the resulting camera control is often imprecise, and video generation quality suffers. In this work, we analyze camera motion from a firs…
View article: FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation
FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation Open
Point cloud frame interpolation is a challenging task that involves accurate scene flow estimation across frames and maintaining the geometry structure. Prevailing techniques often rely on pre-trained motion estimators or intensive testing…
View article: TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks
TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks Open
Neural radiance fields (NeRFs) generally require many images with accurate poses for accurate novel view synthesis, which does not reflect realistic setups where views can be sparse and poses can be noisy. Previous solutions for learning N…
View article: VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Open
Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of complex videos from a text description. However, most existing models lack fine-grained control over camera movement, which is critical for downstream…
View article: GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering
GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering Open
Advancements in 3D Gaussian Splatting have significantly accelerated 3D reconstruction and generation. However, it may require a large number of Gaussians, which creates a substantial memory footprint. This paper introduces GES (Generalize…
View article: SPAD : Spatially Aware Multiview Diffusers
SPAD : Spatially Aware Multiview Diffusers Open
We present SPAD, a novel approach for creating consistent multi-view images from text prompts or single images. To enable multi-view generation, we repurpose a pretrained 2D diffusion model by extending its self-attention layers with cross…
View article: AToM: Amortized Text-to-Mesh using 2D Diffusion
AToM: Amortized Text-to-Mesh using 2D Diffusion Open
We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously. In contrast to existing text-to-3D methods that often entail time-consuming per-prompt optimization an…
View article: Diffusion Priors for Dynamic View Synthesis from Monocular Videos
Diffusion Priors for Dynamic View Synthesis from Monocular Videos Open
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos. Existing methods struggle to distinguishing between motion and structure, particularly in scenarios where camera poses are either unknown …
View article: Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning Open
Large pretrained models are increasingly crucial in modern computer vision tasks. These models are typically used in downstream tasks by end-to-end finetuning, which is highly memory-intensive for tasks with high-resolution data, e.g., vid…
View article: Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors Open
We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce…
View article: Exploring Open-Vocabulary Semantic Segmentation without Human Labels
Exploring Open-Vocabulary Semantic Segmentation without Human Labels Open
Semantic segmentation is a crucial task in computer vision that involves segmenting images into semantically meaningful regions at the pixel level. However, existing approaches often rely on expensive human annotations as supervision for m…
View article: LLM as A Robotic Brain: Unifying Egocentric Memory and Control
LLM as A Robotic Brain: Unifying Egocentric Memory and Control Open
Embodied AI focuses on the study and development of intelligent systems that possess a physical or virtual embodiment (i.e. robots) and are able to dynamically interact with their environment. Memory and control are the two essential parts…
View article: Virulence capacity of different Aspergillus species from invasive pulmonary aspergillosis
Virulence capacity of different Aspergillus species from invasive pulmonary aspergillosis Open
Introduction The opportunistic filamentous fungus Aspergillus causes invasive pulmonary aspergillosis (IPA) that often turns into a fatal infection in immunocompromised hosts. However, the virulence capacity of different Aspergillus specie…
View article: Quantitative and Real‐Time Evaluation of Human Respiration Signals with a Shape‐Conformal Wireless Sensing System
Quantitative and Real‐Time Evaluation of Human Respiration Signals with a Shape‐Conformal Wireless Sensing System Open
Respiration signals reflect many underlying health conditions, including cardiopulmonary functions, autonomic disorders and respiratory distress, therefore continuous measurement of respiration is needed in various cases. Unfortunately, th…
View article: Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud Understanding
Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud Understanding Open
While Transformers have achieved impressive success in natural language processing and computer vision, their performance on 3D point clouds is relatively poor. This is mainly due to the limitation of Transformers: a demanding need for ext…
View article: PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies
PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies Open
PointNet++ is one of the most influential neural architectures for point cloud understanding. Although the accuracy of PointNet++ has been largely surpassed by recent networks such as PointMLP and Point Transformer, we find that a large po…
View article: When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search
When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search Open
The key challenge in neural architecture search (NAS) is designing how to explore wisely in the huge search space. We propose a new NAS method called TNAS (NAS with trees), which improves search efficiency by exploring only a small number …
View article: ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning
ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning Open
Access to 3D point cloud representations has been widely facilitated by LiDAR sensors embedded in various mobile devices. This has led to an emerging need for fast and accurate point cloud processing techniques. In this paper, we revisit a…
View article: PU-GCN: Point Cloud Upsampling using Graph Convolutional Networks
PU-GCN: Point Cloud Upsampling using Graph Convolutional Networks Open
The effectiveness of learning-based point cloud upsampling pipelines heavily relies on the upsampling modules and feature extractors used therein. For the point upsampling module, we propose a novel model called NodeShuffle, which uses a G…
View article: DeepGCNs: Making GCNs Go as Deep as CNNs
DeepGCNs: Making GCNs Go as Deep as CNNs Open
Convolutional neural networks (CNNs) have been very successful at solving a variety of computer vision tasks such as object classification and detection, semantic segmentation, activity understanding, to name just a few. One key enabling f…
View article: Leveraging Graph Convolutional Networks for Point Cloud Upsampling
Leveraging Graph Convolutional Networks for Point Cloud Upsampling Open
Due to hardware limitations, 3D sensors like LiDAR often produce sparse and noisy point clouds. Point cloud upsampling is the task of converting such point clouds into dense and clean ones. This thesis tackles the problem of point cloud up…
View article: SGAS: Sequential Greedy Architecture Search
SGAS: Sequential Greedy Architecture Search Open
Architecture design has become a crucial component of successful deep learning. Recent progress in automatic neural architecture search (NAS) shows a lot of promise. However, discovered architectures often fail to generalize in the final e…
View article: PU-GCN: Point Cloud Upsampling using Graph Convolutional Networks
PU-GCN: Point Cloud Upsampling using Graph Convolutional Networks Open
The effectiveness of learning-based point cloud upsampling pipelines heavily relies on the upsampling modules and feature extractors used therein. For the point upsampling module, we propose a novel model called NodeShuffle, which uses a G…
View article: Rethinking Learning-based Demosaicing, Denoising, and Super-Resolution Pipeline
Rethinking Learning-based Demosaicing, Denoising, and Super-Resolution Pipeline Open
Imaging is usually a mixture problem of incomplete color sampling, noise degradation, and limited resolution. This mixture problem is typically solved by a sequential solution that applies demosaicing (DM), denoising (DN), and super-resolu…
View article: Trinity of Pixel Enhancement: a Joint Solution for Demosaicking, Denoising and Super-Resolution.
Trinity of Pixel Enhancement: a Joint Solution for Demosaicking, Denoising and Super-Resolution. Open
Demosaicing, denoising and super-resolution (SR) are of practical importance in digital image processing and have been studied independently in the passed decades. Despite the recent improvement of learning-based image processing methods i…