Chi‐Wing Fu
YOU?
Author Swipe
View article: Hand-Shadow Poser
Hand-Shadow Poser Open
Hand shadow art is a captivating art form, creatively using hand shadows to reproduce expressive shapes on the wall. In this work, we study an inverse problem: given a target shape, find the poses of left and right hands that together best…
View article: Visualization-Driven Illumination for Density Plots
Visualization-Driven Illumination for Density Plots Open
We present a novel visualization-driven illumination model for density plots, a new technique to enhance density plots by effectively revealing the detailed structures in high- and medium-density regions and outliers in low-density regions…
View article: Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation
Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation Open
Recent Large Multimodal Models have demonstrated remarkable reasoning capabilities, especially in solving complex mathematical problems and realizing accurate spatial perception. Our key insight is that these emerging abilities can natural…
View article: EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning Open
Multimodal large language models (MLLMs) have advanced perception across text, vision, and audio, yet they often struggle with structured cross-modal reasoning, particularly when integrating audio and visual signals. We introduce EchoInk-R…
View article: ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation Open
Controversial contents largely inundate the Internet, infringing various cultural norms and child protection standards. Traditional Image Content Moderation (ICM) models fall short in producing precise moderation decisions for diverse stan…
View article: HybridReg: Robust 3D Point Cloud Registration with Hybrid Motions
HybridReg: Robust 3D Point Cloud Registration with Hybrid Motions Open
Scene-level point cloud registration is very challenging when considering dynamic foregrounds. Existing indoor datasets mostly assume rigid motions, so the trained models cannot robustly handle scenes with non-rigid motions. On the other h…
View article: Not-So-Optimal Transport Flows for 3D Point Cloud Generation
Not-So-Optimal Transport Flows for 3D Point Cloud Generation Open
Learning generative models of 3D point clouds is one of the fundamental problems in 3D generative learning. One of the key properties of point clouds is their permutation invariance, i.e., changing the order of points in a point cloud does…
View article: MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Open
This paper presents a method that allows users to design cinematic video shots in the context of image-to-video generation. Shot design, a critical aspect of filmmaking, involves meticulously planning both camera movements and object motio…
View article: Overcoming Support Dilution for Robust Few-shot Semantic Segmentation
Overcoming Support Dilution for Robust Few-shot Semantic Segmentation Open
Few-shot Semantic Segmentation (FSS) is a challenging task that utilizes limited support images to segment associated unseen objects in query images. However, recent FSS methods are observed to perform worse, when enlarging the number of s…
View article: GeoManip: Geometric Constraints as General Interfaces for Robot Manipulation
GeoManip: Geometric Constraints as General Interfaces for Robot Manipulation Open
We present GeoManip, a framework to enable generalist robots to leverage essential conditions derived from object and part relationships, as geometric constraints, for robot manipulation. For example, cutting the carrot requires adhering t…
View article: ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation Open
Controversial contents largely inundate the Internet, infringing various cultural norms and child protection standards. Traditional Image Content Moderation (ICM) models fall short in producing precise moderation decisions for diverse stan…
View article: MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis
MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis Open
Shadows are often under-considered or even ignored in image editing applications, limiting the realism of the edited results. In this paper, we introduce MetaShadow, a three-in-one versatile framework that enables detection, removal, and c…
View article: CRAYM: Neural Field Optimization via Camera RAY Matching
CRAYM: Neural Field Optimization via Camera RAY Matching Open
We introduce camera ray matching (CRAYM) into the joint optimization of camera poses and neural fields from multi-view images. The optimized field, referred to as a feature volume, can be "probed" by the camera rays for novel view synthesi…
View article: Learn to Create Simple LEGO Micro Buildings
Learn to Create Simple LEGO Micro Buildings Open
This paper presents the first learning-based generative pipeline for effectively creating 3D LEGO® 1 models. This task is very challenging due to the lack of dedicated representations and datasets for learning coherently-connected bricks a…
View article: Visualization-Driven Illumination for Density Plots
Visualization-Driven Illumination for Density Plots Open
We present a novel visualization-driven illumination model for density plots, a new technique to enhance density plots by effectively revealing the detailed structures in high- and medium-density regions and outliers in low-density regions…
View article: PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion
PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion Open
Panoptic lifting is an effective technique to address the 3D panoptic segmentation task by unprojecting 2D panoptic segmentations from multi-views to 3D scene. However, the quality of its results largely depends on the 2D segmentations, wh…
View article: Embodiment-Agnostic Action Planning via Object-Part Scene Flow
Embodiment-Agnostic Action Planning via Object-Part Scene Flow Open
Observing that the key for robotic action planning is to understand the target-object motion when its associated part is manipulated by the end effector, we propose to generate the 3D object-part scene flow and extract its transformations …
View article: Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era
Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era Open
Shadows are created when light encounters obstacles, resulting in regions of reduced illumination. In computer vision, detecting, removing, and generating shadows are critical tasks for improving scene understanding, enhancing image qualit…
View article: Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models
Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models Open
This paper addresses the limitations of adverse weather image restoration approaches trained on synthetic data when applied to real-world scenarios. We formulate a semi-supervised learning framework employing vision-language models to enha…
View article: PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training
PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training Open
Data plays a crucial role in training learning-based methods for 3D point cloud registration. However, the real-world dataset is expensive to build, while rendering-based synthetic data suffers from domain gaps. In this work, we present Po…
View article: CNS-Edit: 3D Shape Editing via Coupled Neural Shape Optimization
CNS-Edit: 3D Shape Editing via Coupled Neural Shape Optimization Open
This paper introduces a new approach based on a coupled representation and a neural volume optimization to implicitly perform 3D shape editing in latent space. This work has three innovations. First, we design the coupled neural shape (CNS…
View article: Object-level Scene Deocclusion
Object-level Scene Deocclusion Open
Deoccluding the hidden portions of objects in a scene is a formidable task, particularly when addressing real-world scenes. In this paper, we present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, a fou…
View article: HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions
HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions Open
Reconstructing 3D hand mesh robustly from a single image is very challenging, due to the lack of diversity in existing real-world datasets. While data synthesis helps relieve the issue, the syn-to-real gap still hinders its usage. In this …
View article: SiMA-Hand: Boosting 3D Hand-Mesh Reconstruction by Single-to-Multi-View Adaptation
SiMA-Hand: Boosting 3D Hand-Mesh Reconstruction by Single-to-Multi-View Adaptation Open
Estimating 3D hand mesh from RGB images is a longstanding track, in which occlusion is one of the most challenging problems. Existing attempts towards this task often fail when the occlusion dominates the image space. In this paper, we pro…
View article: The Test of Time (ToT) Awards
The Test of Time (ToT) Awards Open
Computer Graphics and Applications (IEEE CG&A) Test of Time (ToT) Award was introduced in 2021, aiming to recognize regular or special issue articles published by the magazine that have made profound and
View article: CNS-Edit: 3D Shape Editing via Coupled Neural Shape Optimization
CNS-Edit: 3D Shape Editing via Coupled Neural Shape Optimization Open
This paper introduces a new approach based on a coupled representation and a neural volume optimization to implicitly perform 3D shape editing in latent space. This work has three innovations. First, we design the coupled neural shape (CNS…
View article: SiMA-Hand: Boosting 3D Hand-Mesh Reconstruction by Single-to-Multi-View Adaptation
SiMA-Hand: Boosting 3D Hand-Mesh Reconstruction by Single-to-Multi-View Adaptation Open
Estimating 3D hand mesh from RGB images is a longstanding track, in which occlusion is one of the most challenging problems. Existing attempts towards this task often fail when the occlusion dominates the image space. In this paper, we pro…
View article: Make-A-Shape: a Ten-Million-scale 3D Shape Model
Make-A-Shape: a Ten-Million-scale 3D Shape Model Open
Significant progress has been made in training large generative models for natural language and images. Yet, the advancement of 3D generative models is hindered by their substantial resource demands for training, along with inefficient, no…