Ying-Cong Chen
YOU?
Author Swipe
View article: Graph-Guided Dual-Level Augmentation for 3D Scene Segmentation
Graph-Guided Dual-Level Augmentation for 3D Scene Segmentation Open
3D point cloud segmentation aims to assign semantic labels to individual points in a scene for fine-grained spatial understanding. Existing methods typically adopt data augmentation to alleviate the burden of large-scale annotation. Howeve…
View article: UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy
UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy Open
Computational replication of Chinese calligraphy remains challenging. Existing methods falter, either creating high-quality isolated characters while ignoring page-level aesthetics like ligatures and spacing, or attempting page synthesis a…
View article: Iris3D: 3D Generation via Synchronized Diffusion Distillation
Iris3D: 3D Generation via Synchronized Diffusion Distillation Open
We introduce Iris3D, a novel 3D content generation system that generates vivid textures and detailed 3D shapes while preserving the input information. Our system integrates a Multi-View Large Reconstruction Model (MVLRM [Li et al. 2023b ])…
View article: Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion Open
Recent advancements in generative models have enabled 3D urban scene generation from satellite imagery, unlocking promising applications in gaming, digital twins, and beyond. However, most existing methods rely heavily on neural rendering …
View article: DivPro: diverse protein sequence design with direct structure recovery guidance
DivPro: diverse protein sequence design with direct structure recovery guidance Open
Motivation Structure-based protein design is crucial for designing proteins with novel structures and functions, which aims to generate sequences that fold into desired structures. Current deep learning-based methods primarily focus on tra…
View article: FlexPainter: Flexible and Multi-View Consistent Texture Generation
FlexPainter: Flexible and Multi-View Consistent Texture Generation Open
Texture map production is an important part of 3D modeling and determines the rendering quality. Recently, diffusion-based methods have opened a new way for texture generation. However, restricted control flexibility and limited prompt mod…
View article: Advancing high-fidelity 3D and Texture Generation with 2.5D latents
Advancing high-fidelity 3D and Texture Generation with 2.5D latents Open
Despite the availability of large-scale 3D datasets and advancements in 3D generative models, the complexity and uneven quality of 3D geometry and texture data continue to hinder the performance of 3D generation techniques. In most existin…
View article: Sage Deer: A Super-Aligned Driving Generalist Is Your Copilot
Sage Deer: A Super-Aligned Driving Generalist Is Your Copilot Open
The intelligent driving cockpit, an important part of intelligent driving, needs to match different users' comfort, interaction, and safety needs. This paper aims to build a Super-Aligned and GEneralist DRiving agent, SAGE DeeR. Sage Deer …
View article: DiMeR: Disentangled Mesh Reconstruction Model
DiMeR: Disentangled Mesh Reconstruction Model Open
We propose DiMeR, a novel geometry-texture disentangled feed-forward model with 3D supervision for sparse-view mesh reconstruction. Existing methods confront two persistent obstacles: (i) textures can conceal geometric errors, i.e., visual…
View article: Towards Generalizable Multi-Camera 3D Object Detection via Perspective Rendering
Towards Generalizable Multi-Camera 3D Object Detection via Perspective Rendering Open
Detecting and localizing objects in 3D space using multiple cameras, known as Multi-Camera 3D Object Detection (MC3D-Det), has gained prominence with the advent of bird's-eye view (BEV) approaches. However, these methods often struggle wit…
View article: POSTA: A Go-to Framework for Customized Artistic Poster Generation
POSTA: A Go-to Framework for Customized Artistic Poster Generation Open
Poster design is a critical medium for visual communication. Prior work has explored automatic poster design using deep learning techniques, but these approaches lack text accuracy, user customization, and aesthetic appeal, limiting their …
View article: Efficient Training-Free High-Resolution Synthesis with Energy Rectification in Diffusion Models
Efficient Training-Free High-Resolution Synthesis with Energy Rectification in Diffusion Models Open
Diffusion models have achieved remarkable progress across various visual generation tasks. However, their performance significantly declines when generating content at resolutions higher than those used during training. Although numerous m…
View article: TransPixeler: Advancing Text-to-Video Generation with Transparency
TransPixeler: Advancing Text-to-Video Generation with Transparency Open
Text-to-video generative models have made significant strides, enabling diverse applications in entertainment, advertising, and education. However, generating RGBA video, which includes alpha channels for transparency, remains a challenge …
View article: Dual-Balancing for Multi-Task Learning
Dual-Balancing for Multi-Task Learning Open
View article: Orchestrating Audio: Multi-Agent Framework for Long-Video Audio Synthesis
Orchestrating Audio: Multi-Agent Framework for Long-Video Audio Synthesis Open
View article: Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion
Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion Open
Rendering and inverse rendering are pivotal tasks in both computer vision and graphics. The rendering equation is the core of the two tasks, as an ideal conditional distribution transfer function from intrinsic properties to RGB images. De…
View article: GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs Open
Estimating physical properties for visual data is a crucial task in computer vision, graphics, and robotics, underpinning applications such as augmented reality, physical simulation, and robotic grasping. However, this area remains under-e…
View article: DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving
DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving Open
Photorealistic 4D reconstruction of street scenes is essential for developing real-world simulators in autonomous driving. However, most existing methods perform this task offline and rely on time-consuming iterative processes, limiting th…
View article: Motion Dreamer: Boundary Conditional Motion Reasoning for Physically Coherent Video Generation
Motion Dreamer: Boundary Conditional Motion Reasoning for Physically Coherent Video Generation Open
Recent advances in video generation have shown promise for generating future scenarios, critical for planning and control in autonomous driving and embodied intelligence. However, real-world applications demand more than visually plausible…
View article: LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images
LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images Open
Recent large reconstruction models have made notable progress in generating high-quality 3D objects from single images. However, current reconstruction methods often rely on explicit camera pose estimation or fixed viewpoints, restricting …
View article: FlexGen: Flexible Multi-View Generation from Text and Image Inputs
FlexGen: Flexible Multi-View Generation from Text and Image Inputs Open
In this work, we introduce FlexGen, a flexible framework designed to generate controllable and consistent multi-view images, conditioned on a single-view image, or a text prompt, or both. FlexGen tackles the challenges of controllable mult…
View article: OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction Open
We present OmniBooth, an image generation framework that enables spatial control with instance-level multi-modal customization. For all instances, the multimodal instruction can be described through text prompts or image references. Given …
View article: DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation Open
In the realm of image generation, creating customized images from visual prompt with additional textual instruction emerges as a promising endeavor. However, existing methods, both tuning-based and tuning-free, struggle with interpreting t…
View article: Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Open
Leveraging the visual priors of pre-trained text-to-image diffusion models offers a promising solution to enhance zero-shot generalization in dense prediction tasks. However, existing methods often uncritically use the original diffusion f…
View article: Bi-TTA: Bidirectional Test-Time Adapter for Remote Physiological Measurement
Bi-TTA: Bidirectional Test-Time Adapter for Remote Physiological Measurement Open
Remote photoplethysmography (rPPG) is gaining prominence for its non-invasive approach to monitoring physiological signals using only cameras. Despite its promise, the adaptability of rPPG models to new, unseen domains is hindered due to t…
View article: DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping
DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping Open
Score Distillation Sampling (SDS) has emerged as a prevalent technique for text-to-3D generation, enabling 3D content creation by distilling view-dependent information from text-to-2D guidance. However, they frequently exhibit shortcomings…
View article: From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model
From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model Open
We explore Bird's-Eye View (BEV) generation, converting a BEV map into its corresponding multi-view street images. Valued for its unified spatial representation aiding multi-sensor fusion, BEV is pivotal for various autonomous driving appl…
View article: MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders
MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders Open
Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dens…
View article: Feeding Habits of Scomber japonicus Inferred by Stable Isotope and Fatty Acid Analyses
Feeding Habits of Scomber japonicus Inferred by Stable Isotope and Fatty Acid Analyses Open
Scomber japonicus is widely distributed off the coast of Japan and in the northwestern Pacific. It is an important target for fisheries. To reveal the differences in diet shifts and niche changes of S. japonicus, we collected samples in th…
View article: Ontogenetic Variation in the Trophic and Mercury Levels of Japanese Anchovy in the High Seas of the Northwestern Pacific Ocean
Ontogenetic Variation in the Trophic and Mercury Levels of Japanese Anchovy in the High Seas of the Northwestern Pacific Ocean Open
The aim of this study was to explore the connection between growth and feeding ecology and mercury (Hg) levels in Japanese anchovy (Engraulis japonicus). We measured the amounts of Hg and stable carbon and nitrogen isotopes in the muscle o…