Xinchao Wang
YOU?
Author Swipe
View article: SparseD: Sparse Attention for Diffusion Language Models
SparseD: Sparse Attention for Diffusion Language Models Open
While diffusion language models (DLMs) offer a promising alternative to autoregressive models (ARs), existing open-source DLMs suffer from high inference latency. This bottleneck is mainly due to the attention's quadratic complexity with r…
View article: Multi-material topology optimization for buckling-resistant designs under thermo-mechanical coupling loads
Multi-material topology optimization for buckling-resistant designs under thermo-mechanical coupling loads Open
Existing topology optimization research predominantly isolates thermal or mechanical effects, with insufficient attention to their coupled interactions. Furthermore, most studies neglect buckling constraints—critical for structural stabili…
View article: Landslide susceptibility assessment of upper Yellow River using coupling statistical approaches, machine learning algorithms and SBAS-InSAR technique
Landslide susceptibility assessment of upper Yellow River using coupling statistical approaches, machine learning algorithms and SBAS-InSAR technique Open
Landslide disasters frequently occur in the upper reaches of the Yellow River, particularly within the Gonghe to Xunhua section. A precise evaluation of landslide susceptibility is vital for effective disaster prevention and mitigation. In…
View article: Control and Realism: Best of Both Worlds in Layout-to-Image without Training
Control and Realism: Best of Both Worlds in Layout-to-Image without Training Open
Layout-to-Image generation aims to create complex scenes with precise control over the placement and arrangement of subjects. Existing works have demonstrated that pre-trained Text-to-Image diffusion models can achieve this goal without tr…
View article: Test3R: Learning to Reconstruct 3D at Test Time
Test3R: Learning to Reconstruct 3D at Test Time Open
Dense matching methods like DUSt3R regress pairwise pointmaps for 3D reconstruction. However, the reliance on pairwise prediction and the limited generalization capability inherently restrict the global geometric consistency. In this work,…
View article: Discrete Diffusion in Large Language and Multimodal Models: A Survey
Discrete Diffusion in Large Language and Multimodal Models: A Survey Open
In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decodi…
View article: Image Editing As Programs with Diffusion Models
Image Editing As Programs with Diffusion Models Open
While diffusion models have achieved remarkable success in text-to-image generation, they encounter significant challenges with instruction-driven image editing. Our research highlights a key challenge: these models particularly struggle w…
View article: Vid-SME: Membership Inference Attacks against Large Video Understanding Models
Vid-SME: Membership Inference Attacks against Large Video Understanding Models Open
Multimodal large language models (MLLMs) demonstrate remarkable capabilities in handling complex multimodal tasks and are increasingly adopted in video understanding applications. However, their rapid advancement raises serious data privac…
View article: Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps Open
Multimodal large language models (MLLMs) have recently achieved significant progress in visual tasks, including semantic scene understanding and text-image alignment, with reasoning variants enhancing performance on complex tasks involving…
View article: Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding Open
In this work, we propose Dimple, the first Discrete Diffusion Multimodal Large Language Model (DMLLM). We observe that training with a purely discrete diffusion approach leads to significant training instability, suboptimal performance, an…
View article: dKV-Cache: The Cache for Diffusion Language Models
dKV-Cache: The Cache for Diffusion Language Models Open
Diffusion Language Models (DLMs) have been seen as a promising competitor for autoregressive language models. However, diffusion language models have long been constrained by slow inference. A core challenge is that their non-autoregressiv…
View article: Investigating olive pomace activated carbon for degrading organic dyes in water
Investigating olive pomace activated carbon for degrading organic dyes in water Open
View article: Safety evaluation after LASIK surgery based on the linear creep property: a finite element analysis
Safety evaluation after LASIK surgery based on the linear creep property: a finite element analysis Open
AIM: To investigate the effect of the percent tissue altered (PTA) on the safety after laser-assisted in situ keratomileusis (LASIK) based on linear creep characteristics. METHODS: The linear creep characteristics of the cornea were charac…
View article: PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning Open
Self-supervised representation learning for point cloud has demonstrated effectiveness in improving pre-trained model performance across diverse tasks. However, as pre-trained models grow in complexity, fully fine-tuning them for downstrea…
View article: GFlow: Recovering 4D World from Monocular Video
GFlow: Recovering 4D World from Monocular Video Open
Recovering 4D world from monocular video is a crucial yet challenging task. Conventional methods usually rely on the assumptions of multi-view videos, known camera parameters, or static scenes. In this paper, we relax all these constraints…
View article: Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classifications
Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classifications Open
Graph Neural Networks (GNNs) have become the preferred tool to process graph data, with their efficacy being boosted through graph data augmentation techniques. Despite the evolution of augmentation methods, issues like graph property dist…
View article: Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling
Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling Open
Rendering dynamic scenes from monocular videos is a crucial yet challenging task. The recent deformable Gaussian Splatting has emerged as a robust solution to represent real-world dynamic scenes. However, it often leads to heavily redundan…
View article: 1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering Open
4D Gaussian Splatting (4DGS) has recently gained considerable attention as a method for reconstructing dynamic scenes. Despite achieving superior quality, 4DGS typically requires substantial storage and suffers from slow rendering speed. I…
View article: OminiControl2: Efficient Conditioning for Diffusion Transformers
OminiControl2: Efficient Conditioning for Diffusion Transformers Open
Fine-grained control of text-to-image diffusion transformer models (DiT) remains a critical challenge for practical deployment. While recent advances such as OminiControl and others have enabled a controllable generation of diverse control…
View article: Understanding Dataset Distillation via Spectral Filtering
Understanding Dataset Distillation via Spectral Filtering Open
Dataset distillation (DD) has emerged as a promising approach to compress datasets and speed up model training. However, the underlying connections among various DD methods remain largely unexplored. In this paper, we introduce UniDD, a sp…
View article: GraphBridge: Towards Arbitrary Transfer Learning in GNNs
GraphBridge: Towards Arbitrary Transfer Learning in GNNs Open
Graph neural networks (GNNs) are conventionally trained on a per-domain, per-task basis. It creates a significant barrier in transferring the acquired knowledge to different, heterogeneous data setups. This paper introduces GraphBridge, a …
View article: Introducing Visual Perception Token into Multimodal Large Language Model
Introducing Visual Perception Token into Multimodal Large Language Model Open
To utilize visual information, Multimodal Large Language Model (MLLM) relies on the perception process of its vision encoder. The completeness and accuracy of visual perception significantly influence the precision of spatial reasoning, fi…
View article: Few-shot Implicit Function Generation via Equivariance
Few-shot Implicit Function Generation via Equivariance Open
Implicit Neural Representations (INRs) have emerged as a powerful framework for representing continuous signals. However, generating diverse INR weights remains challenging due to limited training data. We introduce Few-shot Implicit Funct…
View article: Potential landslide detection and influencing factors analysis in the upper Yellow River based on SBAS-InSAR technology
Potential landslide detection and influencing factors analysis in the upper Yellow River based on SBAS-InSAR technology Open
This study examined the frequent occurrence of landslide disasters in the upper reaches of the Yellow River (from Gonghe to Xunhua) using Sentinel-1A data from January 2021 to December 2023 and integrating it with small baseline subset int…
View article: Open-World Authorship Attribution
Open-World Authorship Attribution Open
View article: Caterpillar-Inspired Electromagnetic Dual-Function Cuniculi Composites for Efficient Board Bandwidth Microwave Absorption
Caterpillar-Inspired Electromagnetic Dual-Function Cuniculi Composites for Efficient Board Bandwidth Microwave Absorption Open
View article: CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up Open
Diffusion Transformers (DiT) have become a leading architecture in image generation. However, the quadratic complexity of attention mechanisms, which are responsible for modeling token-wise relationships, results in significant latency whe…
View article: Language Model as Visual Explainer
Language Model as Visual Explainer Open
In this paper, we present Language Model as Visual Explainer LVX, a systematic approach for interpreting the internal workings of vision models using a tree-structured linguistic explanation, without the need for model training. Central to…
View article: Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising
Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising Open
Transformer-based diffusion models have achieved significant advancements across a variety of generative tasks. However, producing high-quality outputs typically necessitates large transformer models, which result in substantial training a…
View article: One-shot Federated Learning via Synthetic Distiller-Distillate Communication
One-shot Federated Learning via Synthetic Distiller-Distillate Communication Open
One-shot Federated learning (FL) is a powerful technology facilitating collaborative training of machine learning models in a single round of communication. While its superiority lies in communication efficiency and privacy preservation co…