Explanipedia

SparseD: Sparse Attention for Diffusion Language Models Open

Zeqing Wang, Gongfan Fang, Xinyin Ma, Xingyi Yang, Xinchao Wang · 2025

While diffusion language models (DLMs) offer a promising alternative to autoregressive models (ARs), existing open-source DLMs suffer from high inference latency. This bottleneck is mainly due to the attention's quadratic complexity with r…

Multi-material topology optimization for buckling-resistant designs under thermo-mechanical coupling loads Open

Ning Gan, Xinchao Wang, Bo Sun · 2025

Existing topology optimization research predominantly isolates thermal or mechanical effects, with insufficient attention to their coupled interactions. Furthermore, most studies neglect buckling constraints—critical for structural stabili…

Landslide susceptibility assessment of upper Yellow River using coupling statistical approaches, machine learning algorithms and SBAS-InSAR technique Open

Jin Zeng, Wanbing Tuo, Xinchao Wang, Xingchang Zhao · 2025

Landslide disasters frequently occur in the upper reaches of the Yellow River, particularly within the Gonghe to Xunhua section. A precise evaluation of landslide susceptibility is vital for effective disaster prevention and mitigation. In…

Control and Realism: Best of Both Worlds in Layout-to-Image without Training Open

B. Li, Yinhan Hu, Songhua Liu, Xinchao Wang · 2025

Layout-to-Image generation aims to create complex scenes with precise control over the placement and arrangement of subjects. Existing works have demonstrated that pre-trained Text-to-Image diffusion models can achieve this goal without tr…

Test3R: Learning to Reconstruct 3D at Test Time Open

Yuheng Yuan, Qiuhong Shen, Shizun Wang, Xingyi Yang, Xinchao Wang · 2025

Dense matching methods like DUSt3R regress pairwise pointmaps for 3D reconstruction. However, the reliance on pairwise prediction and the limited generalization capability inherently restrict the global geometric consistency. In this work,…

Discrete Diffusion in Large Language and Multimodal Models: A Survey Open

Runpeng Yu, Qi Li, Xinchao Wang · 2025

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decodi…

Image Editing As Programs with Diffusion Models Open

Yujia Hu, Songhua Liu, Zhenxiong Tan, Xingyi Yang, Xinchao Wang · 2025

While diffusion models have achieved remarkable success in text-to-image generation, they encounter significant challenges with instruction-driven image editing. Our research highlights a key challenge: these models particularly struggle w…

Vid-SME: Membership Inference Attacks against Large Video Understanding Models Open

Qi Li, Robert K. Yu, Xinchao Wang · 2025

Multimodal large language models (MLLMs) demonstrate remarkable capabilities in handling complex multimodal tasks and are increasingly adopted in video understanding applications. However, their rapid advancement raises serious data privac…

Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps Open

Siyi Feng, Song Wang, Shuyi Ouyang, Lingdong Kong, Zikai Song , et al. · 2025

Multimodal large language models (MLLMs) have recently achieved significant progress in visual tasks, including semantic scene understanding and text-image alignment, with reasoning variants enhancing performance on complex tasks involving…

Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding Open

Runpeng Yu, Xinyin Ma, Xinchao Wang · 2025

In this work, we propose Dimple, the first Discrete Diffusion Multimodal Large Language Model (DMLLM). We observe that training with a purely discrete diffusion approach leads to significant training instability, suboptimal performance, an…

dKV-Cache: The Cache for Diffusion Language Models Open

Xinyin Ma, Ruotong Yu, Gongfan Fang, Xinchao Wang · 2025

Diffusion Language Models (DLMs) have been seen as a promising competitor for autoregressive language models. However, diffusion language models have long been constrained by slow inference. A core challenge is that their non-autoregressiv…

Investigating olive pomace activated carbon for degrading organic dyes in water Open

Deye Qu, Yali Yu, Mengchen Zhu, Chunni Lei, Bo Wang , et al. · 2025

Safety evaluation after LASIK surgery based on the linear creep property: a finite element analysis Open

Jixi Guo, Xuefeng Li, Xinchao Wang, Xingsheng Zhao, Yan Wang , et al. · 2025

AIM: To investigate the effect of the percent tissue altered (PTA) on the safety after laser-assisted in situ keratomileusis (LASIK) based on linear creep characteristics. METHODS: The linear creep characteristics of the cornea were charac…

PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning Open

Song Wang, Xiaolu Liu, Lingdong Kong, Jianyun Xu, Chunyong Hu , et al. · 2025

Self-supervised representation learning for point cloud has demonstrated effectiveness in improving pre-trained model performance across diverse tasks. However, as pre-trained models grow in complexity, fully fine-tuning them for downstrea…

GFlow: Recovering 4D World from Monocular Video Open

Shizun Wang, Xingyi Yang, Qiuhong Shen, Zhenxiang Jiang, Xinchao Wang · 2025

Recovering 4D world from monocular video is a crucial yet challenging task. Conventional methods usually rely on the assumptions of multi-view videos, known camera parameters, or static scenes. In this paper, we relax all these constraints…

Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classifications Open

Yutong Xia, Runpeng Yu, Yuxuan Liang, Xavier Bresson, Xinchao Wang , et al. · 2025

Graph Neural Networks (GNNs) have become the preferred tool to process graph data, with their efficacy being boosted through graph data augmentation techniques. Despite the evolution of augmentation methods, issues like graph property dist…

Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling Open

Hanyang Kong, Xingyi Yang, Xinchao Wang · 2025

Rendering dynamic scenes from monocular videos is a crucial yet challenging task. The recent deformable Gaussian Splatting has emerged as a robust solution to represent real-world dynamic scenes. However, it often leads to heavily redundan…

1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering Open

Yang Yuan, Qiuhong Shen, Xingyi Yang, Xinchao Wang · 2025

4D Gaussian Splatting (4DGS) has recently gained considerable attention as a method for reconstructing dynamic scenes. Despite achieving superior quality, 4DGS typically requires substantial storage and suffers from slow rendering speed. I…

OminiControl2: Efficient Conditioning for Diffusion Transformers Open

Zhenxiong Tan, Qiao Xue, Xingyi Yang, Songhua Liu, Xinchao Wang · 2025

Fine-grained control of text-to-image diffusion transformer models (DiT) remains a critical challenge for practical deployment. While recent advances such as OminiControl and others have enabled a controllable generation of diverse control…

Understanding Dataset Distillation via Spectral Filtering Open

Deyu Bo, Songhua Liu, Xinchao Wang · 2025

Dataset distillation (DD) has emerged as a promising approach to compress datasets and speed up model training. However, the underlying connections among various DD methods remain largely unexplored. In this paper, we introduce UniDD, a sp…

GraphBridge: Towards Arbitrary Transfer Learning in GNNs Open

Li Ju, Xingyi Yang, Qi Li, Xinchao Wang · 2025

Graph neural networks (GNNs) are conventionally trained on a per-domain, per-task basis. It creates a significant barrier in transferring the acquired knowledge to different, heterogeneous data setups. This paper introduces GraphBridge, a …

Introducing Visual Perception Token into Multimodal Large Language Model Open

Runpeng Yu, Xinyin Ma, Xinchao Wang · 2025

To utilize visual information, Multimodal Large Language Model (MLLM) relies on the perception process of its vision encoder. The completeness and accuracy of visual perception significantly influence the precision of spatial reasoning, fi…

Few-shot Implicit Function Generation via Equivariance Open

Songming Huang, Xingyi Yang, Hongtao Lu, Xinchao Wang · 2025

Implicit Neural Representations (INRs) have emerged as a powerful framework for representing continuous signals. However, generating diverse INR weights remains challenging due to limited training data. We introduce Few-shot Implicit Funct…

Potential landslide detection and influencing factors analysis in the upper Yellow River based on SBAS-InSAR technology Open

Jin Zeng, Wanbing Tuo, Xinchao Wang, Xingchang Zhao · 2025

This study examined the frequent occurrence of landslide disasters in the upper reaches of the Yellow River (from Gonghe to Xunhua) using Sentinel-1A data from January 2021 to December 2023 and integrating it with small baseline subset int…

Open-World Authorship Attribution Open

Xinhao Tan, S. B. Liu, Xia Cong, K. L. Li, Xinchao Wang · 2025

Caterpillar-Inspired Electromagnetic Dual-Function Cuniculi Composites for Efficient Board Bandwidth Microwave Absorption Open

Xiao Li, Junpeng Ma, Xu Zhou, Weipeng Zhong, Xinchao Wang , et al. · 2025

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up Open

Songhua Liu, Zhenxiong Tan, Xinchao Wang · 2024

Diffusion Transformers (DiT) have become a leading architecture in image generation. However, the quadratic complexity of attention mechanisms, which are responsible for modeling token-wise relationships, results in significant latency whe…

Language Model as Visual Explainer Open

Xingyi Yang, Xinchao Wang · 2024

In this paper, we present Language Model as Visual Explainer LVX, a systematic approach for interpreting the internal workings of vision models using a tree-structured linguistic explanation, without the need for model training. Central to…

Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising Open

Gongfan Fang, Xinyin Ma, Xinchao Wang · 2024

Transformer-based diffusion models have achieved significant advancements across a variety of generative tasks. However, producing high-quality outputs typically necessitates large transformer models, which result in substantial training a…

One-shot Federated Learning via Synthetic Distiller-Distillate Communication Open

Jie Zhang, Songhua Liu, Xinchao Wang · 2024

One-shot Federated learning (FL) is a powerful technology facilitating collaborative training of machine learning models in a single round of communication. While its superiority lies in communication efficiency and privacy preservation co…

Xinchao Wang YOU? Author Swipe