Explanipedia

Graph-Guided Dual-Level Augmentation for 3D Scene Segmentation Open

Lin Hong-bin, Yi‐Fan Jiang, Juangui Xu, Jesse J. Xu, Yi Lu , et al. · 2025

3D point cloud segmentation aims to assign semantic labels to individual points in a scene for fine-grained spatial understanding. Existing methods typically adopt data augmentation to alleviate the burden of large-scale annotation. Howeve…

UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy Open

T L Xu, K. W. Wang, Zhongyu Chen, Lei Wu, Tao Wen , et al. · 2025

Computational replication of Chinese calligraphy remains challenging. Existing methods falter, either creating high-quality isolated characters while ignoring page-level aesthetics like ligatures and spacing, or attempting page synthesis a…

Iris3D: 3D Generation via Synchronized Diffusion Distillation Open

Yixun Liang, Weiyu Li, Rui Chen, Fei-Peng Tian, Jiarui Liu , et al. · 2025

We introduce Iris3D, a novel 3D content generation system that generates vivid textures and detailed 3D shapes while preserving the input information. Our system integrates a Multi-View Large Reconstruction Model (MVLRM [Li et al. 2023b ])…

Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion Open

Tongyan Hua, Lutao Jiang, Ying-Cong Chen, Wufan Zhao · 2025

Recent advancements in generative models have enabled 3D urban scene generation from satellite imagery, unlocking promising applications in gaming, digital twins, and beyond. However, most existing methods rely heavily on neural rendering …

DivPro: diverse protein sequence design with direct structure recovery guidance Open

Xinyi Zhou, Guibao Shen, Ying-Cong Chen, Guangyong Chen, Pheng‐Ann Heng · 2025

Motivation Structure-based protein design is crucial for designing proteins with novel structures and functions, which aims to generate sequences that fold into desired structures. Current deep learning-based methods primarily focus on tra…

FlexPainter: Flexible and Multi-View Consistent Texture Generation Open

Dongyu Yan, Lei Wu, Jiantao Lin, Luozhou Wang, Tianshuo Xu , et al. · 2025

Texture map production is an important part of 3D modeling and determines the rendering quality. Recently, diffusion-based methods have opened a new way for texture generation. However, restricted control flexibility and limited prompt mod…

Advancing high-fidelity 3D and Texture Generation with 2.5D latents Open

Xin Yang, Jiantao Lin, Yingjie Xu, Haodong Li, Ying-Cong Chen · 2025

Despite the availability of large-scale 3D datasets and advancements in 3D generative models, the complexity and uneven quality of 3D geometry and texture data continue to hinder the performance of 3D generation techniques. In most existin…

Sage Deer: A Super-Aligned Driving Generalist Is Your Copilot Open

Hao Lü, Jiaqi Tang, Jiyao Wang, Lu Yang, Xu Cao , et al. · 2025

The intelligent driving cockpit, an important part of intelligent driving, needs to match different users' comfort, interaction, and safety needs. This paper aims to build a Super-Aligned and GEneralist DRiving agent, SAGE DeeR. Sage Deer …

DiMeR: Disentangled Mesh Reconstruction Model Open

L W Jiang, Jiantao Lin, Kanghao Chen, Wenwei Ge, Xiao Yang , et al. · 2025

We propose DiMeR, a novel geometry-texture disentangled feed-forward model with 3D supervision for sparse-view mesh reconstruction. Existing methods confront two persistent obstacles: (i) textures can conceal geometric errors, i.e., visual…

Towards Generalizable Multi-Camera 3D Object Detection via Perspective Rendering Open

Hao Lu, Yunpeng Zhang, Guoqing Wang, Qing Lian, Dalong Du , et al. · 2025

Detecting and localizing objects in 3D space using multiple cameras, known as Multi-Camera 3D Object Detection (MC3D-Det), has gained prominence with the advent of bird's-eye view (BEV) approaches. However, these methods often struggle wit…

POSTA: A Go-to Framework for Customized Artistic Poster Generation Open

Hanying Chen, Xiaojie Xu, W. Li, Jingjing Ren, Ye Tian , et al. · 2025

Poster design is a critical medium for visual communication. Prior work has explored automatic poster design using deep learning techniques, but these approaches lack text accuracy, user customization, and aesthetic appeal, limiting their …

Efficient Training-Free High-Resolution Synthesis with Energy Rectification in Diffusion Models Open

Zhen Yang, Guibao Shen, Mimi Li, Liang Hou, Mushui Liu , et al. · 2025

Diffusion models have achieved remarkable progress across various visual generation tasks. However, their performance significantly declines when generating content at resolutions higher than those used during training. Although numerous m…

TransPixeler: Advancing Text-to-Video Generation with Transparency Open

Li Wang, Yijun Li, Zhifei Chen, Jui-Hsien Wang, Zhifei Zhang , et al. · 2025

Text-to-video generative models have made significant strides, enabling diverse applications in entertainment, advertising, and education. However, generating RGBA video, which includes alpha channels for transparency, remains a challenge …

Dual-Balancing for Multi-Task Learning Open

Baijiong Lin, Weisen Jiang, Pengguang Chen, Feiyang Ye, Yu Zhang , et al. · 2025

Orchestrating Audio: Multi-Agent Framework for Long-Video Audio Synthesis Open

Y H Zhang, Xinli Xu, Xiaojie Xu, Doudou Zhang, Li Ping , et al. · 2025

Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion Open

Zhifei Chen, Tianshuo Xu, Wenhang Ge, Lei Wu, Dongyu Yan , et al. · 2024

Rendering and inverse rendering are pivotal tasks in both computer vision and graphics. The rendering equation is the core of the two tasks, as an ideal conditional distribution transfer function from intrinsic properties to RGB images. De…

GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs Open

Xinli Xu, Wenhang Ge, Dicong Qiu, Zhenlan Chen, Daoqing Yan , et al. · 2024

Estimating physical properties for visual data is a crucial task in computer vision, graphics, and robotics, underpinning applications such as augmented reality, physical simulation, and robotic grasping. However, this area remains under-e…

DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving Open

Hao Lü, Tianshuo Xu, Wenzhao Zheng, Yunpeng Zhang, Wei Zhan , et al. · 2024

Photorealistic 4D reconstruction of street scenes is essential for developing real-world simulators in autonomous driving. However, most existing methods perform this task offline and rely on time-consuming iterative processes, limiting th…

Motion Dreamer: Boundary Conditional Motion Reasoning for Physically Coherent Video Generation Open

Tianshuo Xu, Zhifei Chen, Lei Wu, Hao Lu, Yuying Chen , et al. · 2024

Recent advances in video generation have shown promise for generating future scenarios, critical for planning and control in autonomous driving and embodied intelligence. However, real-world applications demand more than visually plausible…

LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images Open

Hao He, Yixun Liang, Luozhou Wang, Yuanhao Cai, Xinli Xu , et al. · 2024

Recent large reconstruction models have made notable progress in generating high-quality 3D objects from single images. However, current reconstruction methods often rely on explicit camera pose estimation or fixed viewpoints, restricting …

FlexGen: Flexible Multi-View Generation from Text and Image Inputs Open

Xinli Xu, Wenhang Ge, Jiantao Lin, Jiawei Feng, Lie Xu , et al. · 2024

In this work, we introduce FlexGen, a flexible framework designed to generate controllable and consistent multi-view images, conditioned on a single-view image, or a text prompt, or both. FlexGen tackles the challenges of controllable mult…

OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction Open

Leheng Li, Weichao Qiu, Yan Xu, Jing He, Kaiqiang Zhou , et al. · 2024

We present OmniBooth, an image generation framework that enables spatial control with instance-level multi-modal customization. For all instances, the multimodal instruction can be described through text prompts or image references. Given …

DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation Open

Jing He, Haodong Li, Yuxin Hu, Guibao Shen, Yingjie Cai , et al. · 2024

In the realm of image generation, creating customized images from visual prompt with additional textual instruction emerges as a promising endeavor. However, existing methods, both tuning-based and tuning-free, struggle with interpreting t…

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Open

Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li , et al. · 2024

Leveraging the visual priors of pre-trained text-to-image diffusion models offers a promising solution to enhance zero-shot generalization in dense prediction tasks. However, existing methods often uncritically use the original diffusion f…

Bi-TTA: Bidirectional Test-Time Adapter for Remote Physiological Measurement Open

Haodong Li, Hao Lu, Ying-Cong Chen · 2024

Remote photoplethysmography (rPPG) is gaining prominence for its non-invasive approach to monitoring physiological signals using only cameras. Despite its promise, the adaptability of rPPG models to new, unseen domains is hindered due to t…

DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping Open

Zeyu Cai, Duotun Wang, Yixun Liang, Zhijing Shao, Ying-Cong Chen , et al. · 2024

Score Distillation Sampling (SDS) has emerged as a prevalent technique for text-to-3D generation, enabling 3D content creation by distilling view-dependent information from text-to-2D guidance. However, they frequently exhibit shortcomings…

From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model Open

Xiaojie Xu, Tianshuo Xu, Fulong Ma, Ying-Cong Chen · 2024

We explore Bird's-Eye View (BEV) generation, converting a BEV map into its corresponding multi-view street images. Valued for its unified spatial representation aiding multi-sensor fusion, BEV is pivotal for various autonomous driving appl…

MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders Open

Baijiong Lin, Weisen Jiang, Pengguang Chen, Shu Liu, Ying-Cong Chen · 2024

Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dens…

Feeding Habits of Scomber japonicus Inferred by Stable Isotope and Fatty Acid Analyses Open

Ying-Cong Chen, Guanyu Hu, Zhenfang Zhao, Xinjun Chen, Bilin Liu · 2024

Scomber japonicus is widely distributed off the coast of Japan and in the northwestern Pacific. It is an important target for fisheries. To reveal the differences in diet shifts and niche changes of S. japonicus, we collected samples in th…

Ontogenetic Variation in the Trophic and Mercury Levels of Japanese Anchovy in the High Seas of the Northwestern Pacific Ocean Open

Long Chen, Guanyu Hu, Zhenfang Zhao, Bilin Liu, Xinjun Chen , et al. · 2024

The aim of this study was to explore the connection between growth and feeding ecology and mercury (Hg) levels in Japanese anchovy (Engraulis japonicus). We measured the amounts of Hg and stable carbon and nitrogen isotopes in the muscle o…

Ying-Cong Chen YOU? Author Swipe