Explanipedia

ComposeMe: Attribute-Specific Image Prompts for Controllable Human Image Generation Open

Guocheng Qian, Daniil Ostashev, Egor Nemchinov, Avihay Assouline, Sergey Tulyakov , et al. · 2025

Generating high-fidelity images of humans with fine-grained control over attributes such as hairstyle and clothing remains a core challenge in personalized text-to-image synthesis. While prior methods emphasize identity preservation from a…

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Open

Zetian Mi, Kuan-Chieh Wang, Guocheng Qian, Hanrong Ye, Runtao Liu , et al. · 2025

This paper presents ThinkDiff, a novel alignment paradigm that empowers text-to-image diffusion models with multimodal in-context understanding and reasoning capabilities by integrating the strengths of vision-language models (VLMs). Exist…

Wonderland: Navigating 3D Scenes from a Single Image Open

Hanwen Liang, Jun‐Li Cao, Vidit Goel, Guocheng Qian, S.P. Korolev , et al. · 2024

How can one efficiently generate high-quality, wide-scope 3D scenes from arbitrary single images? Existing methods suffer several drawbacks, such as requiring multi-view data, time-consuming per-scene optimization, distorted geometry in oc…

Omni-ID: Holistic Identity Representation Designed for Generative Tasks Open

Guocheng Qian, Kuan-Chieh Wang, Or Patashnik, Negin Heravi, Daniil Ostashev , et al. · 2024

We introduce Omni-ID, a novel facial representation designed specifically for generative tasks. Omni-ID encodes holistic information about an individual's appearance across diverse expressions and poses within a fixed-size representation. …

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers Open

Sherwin Bahmani, Ivan Skorokhodov, Guocheng Qian, Aliaksandr Siarohin, Willi Menapace , et al. · 2024

Numerous works have recently integrated 3D camera control into foundational text-to-video models, but the resulting camera control is often imprecise, and video generation quality suffers. In this work, we analyze camera motion from a firs…

FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation Open

Tianyu Zhang, Guocheng Qian, Jin Xie, Jian Yang · 2024

Point cloud frame interpolation is a challenging task that involves accurate scene flow estimation across frames and maintaining the geometry structure. Prevailing techniques often rely on pre-trained motion estimators or intensive testing…

TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks Open

Jinjie Mai, Wenxuan Zhu, Sara Rojas, Jesús Zarzar, Abdullah Hamdi , et al. · 2024

Neural radiance fields (NeRFs) generally require many images with accurate poses for accurate novel view synthesis, which does not reflect realistic setups where views can be sparse and poses can be noisy. Previous solutions for learning N…

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Open

Sherwin Bahmani, Ivan Skorokhodov, Aliaksandr Siarohin, Willi Menapace, Guocheng Qian , et al. · 2024

Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of complex videos from a text description. However, most existing models lack fine-grained control over camera movement, which is critical for downstream…

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering Open

Abdullah Hamdi, Luke Melas-Kyriazi, Guocheng Qian, Jinjie Mai, Ruoshi Liu , et al. · 2024

Advancements in 3D Gaussian Splatting have significantly accelerated 3D reconstruction and generation. However, it may require a large number of Gaussians, which creates a substantial memory footprint. This paper introduces GES (Generalize…

SPAD : Spatially Aware Multiview Diffusers Open

Yash Kant, Ziyi Wu, Michael Vasilkovsky, Guocheng Qian, Jian Ren , et al. · 2024

We present SPAD, a novel approach for creating consistent multi-view images from text prompts or single images. To enable multi-view generation, we repurpose a pretrained 2D diffusion model by extending its self-attention layers with cross…

AToM: Amortized Text-to-Mesh using 2D Diffusion Open

Guocheng Qian, Jun‐Li Cao, Aliaksandr Siarohin, Yash Kant, Chaoyang Wang , et al. · 2024

We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously. In contrast to existing text-to-3D methods that often entail time-consuming per-prompt optimization an…

Diffusion Priors for Dynamic View Synthesis from Monocular Videos Open

Chaoyang Wang, Peiye Zhuang, Aliaksandr Siarohin, Jun‐Li Cao, Guocheng Qian , et al. · 2024

Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos. Existing methods struggle to distinguishing between motion and structure, particularly in scenarios where camera poses are either unknown …

Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning Open

Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra , et al. · 2024

Large pretrained models are increasingly crucial in modern computer vision tasks. These models are typically used in downstream tasks by end-to-end finetuning, which is highly memory-intensive for tasks with high-resolution data, e.g., vid…

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors Open

Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin , et al. · 2023

We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce…

Exploring Open-Vocabulary Semantic Segmentation without Human Labels Open

Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan , et al. · 2023

Semantic segmentation is a crucial task in computer vision that involves segmenting images into semantically meaningful regions at the pixel level. However, existing approaches often rely on expensive human annotations as supervision for m…

LLM as A Robotic Brain: Unifying Egocentric Memory and Control Open

Jinjie Mai, Jun Chen, Bing Li, Guocheng Qian, Mohamed Elhoseiny , et al. · 2023

Embodied AI focuses on the study and development of intelligent systems that possess a physical or virtual embodiment (i.e. robots) and are able to dynamically interact with their environment. Memory and control are the two essential parts…

Virulence capacity of different Aspergillus species from invasive pulmonary aspergillosis Open

Biao Chen, Guocheng Qian, Zhiya Yang, Ning Zhang, Yufeng Jiang , et al. · 2023

Introduction The opportunistic filamentous fungus Aspergillus causes invasive pulmonary aspergillosis (IPA) that often turns into a fatal infection in immunocompromised hosts. However, the virulence capacity of different Aspergillus specie…

Quantitative and Real‐Time Evaluation of Human Respiration Signals with a Shape‐Conformal Wireless Sensing System Open

Sicheng Chen, Guocheng Qian, Bernard Ghanem, Yongqing Wang, Zhou Shu , et al. · 2022

Respiration signals reflect many underlying health conditions, including cardiopulmonary functions, autonomic disorders and respiratory distress, therefore continuous measurement of respiration is needed in various cases. Unfortunately, th…

Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud Understanding Open

Guocheng Qian, Xingdi Zhang, Abdullah Hamdi, Bernard Ghanem · 2022

While Transformers have achieved impressive success in natural language processing and computer vision, their performance on 3D point clouds is relatively poor. This is mainly due to the limitation of Transformers: a demanding need for ext…

PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies Open

Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Abed Al Kader Hammoud , et al. · 2022

PointNet++ is one of the most influential neural architectures for point cloud understanding. Although the accuracy of PointNet++ has been largely surpassed by recent networks such as PointMLP and Point Transformer, we find that a large po…

When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search Open

Guocheng Qian, Xuanyang Zhang, Guohao Li, Chen Zhao, Yukang Chen , et al. · 2022

The key challenge in neural architecture search (NAS) is designing how to explore wisely in the huge search space. We propose a new NAS method called TNAS (NAS with trees), which improves search efficiency by exploring only a small number …

ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning Open

Guocheng Qian, Hasan Abed Al Kader Hammoud, Guohao Li, Ali Thabet, Bernard Ghanem · 2021

Access to 3D point cloud representations has been widely facilitated by LiDAR sensors embedded in various mobile devices. This has led to an emerging need for fast and accurate point cloud processing techniques. In this paper, we revisit a…

PU-GCN: Point Cloud Upsampling using Graph Convolutional Networks Open

Guocheng Qian, Abdulellah Abualshour, Guohao Li, Ali Thabet, Bernard Ghanem · 2021

The effectiveness of learning-based point cloud upsampling pipelines heavily relies on the upsampling modules and feature extractors used therein. For the point upsampling module, we propose a novel model called NodeShuffle, which uses a G…

DeepGCNs: Making GCNs Go as Deep as CNNs Open

Guohao Li, Matthias Mueller, Guocheng Qian, Itzel C. Delgadillo, Abdulellah Abualshour , et al. · 2021

Convolutional neural networks (CNNs) have been very successful at solving a variety of computer vision tasks such as object classification and detection, semantic segmentation, activity understanding, to name just a few. One key enabling f…

Leveraging Graph Convolutional Networks for Point Cloud Upsampling Open

Guocheng Qian · 2020

Due to hardware limitations, 3D sensors like LiDAR often produce sparse and noisy point clouds. Point cloud upsampling is the task of converting such point clouds into dense and clean ones. This thesis tackles the problem of point cloud up…

SGAS: Sequential Greedy Architecture Search Open

Guohao Li, Guocheng Qian, Itzel C. Delgadillo, Matthias Müller, Ali Thabet , et al. · 2019

Architecture design has become a crucial component of successful deep learning. Recent progress in automatic neural architecture search (NAS) shows a lot of promise. However, discovered architectures often fail to generalize in the final e…

PU-GCN: Point Cloud Upsampling using Graph Convolutional Networks Open

Guocheng Qian, Abdulellah Abualshour, Guohao Li, Ali Thabet, Bernard Ghanem · 2019

The effectiveness of learning-based point cloud upsampling pipelines heavily relies on the upsampling modules and feature extractors used therein. For the point upsampling module, we propose a novel model called NodeShuffle, which uses a G…

Rethinking Learning-based Demosaicing, Denoising, and Super-Resolution Pipeline Open

Guocheng Qian, Yuanhao Wang, Chao Dong, Jimmy Ren, Wolfgang Heidrich , et al. · 2019

Imaging is usually a mixture problem of incomplete color sampling, noise degradation, and limited resolution. This mixture problem is typically solved by a sequential solution that applies demosaicing (DM), denoising (DN), and super-resolu…

Trinity of Pixel Enhancement: a Joint Solution for Demosaicking, Denoising and Super-Resolution. Open

Guocheng Qian, Jinjin Gu, Jimmy Ren, Chao Dong, Furong Zhao , et al. · 2019

Demosaicing, denoising and super-resolution (SR) are of practical importance in digital image processing and have been studied independently in the passed decades. Despite the recent improvement of learning-based image processing methods i…

Guocheng Qian YOU? Author Swipe