Explanipedia

Imbalanced learning using the area under the curve and proximal support vector machine for image steganalysis Open

Xuelong Li · 2025

The proposed research introduces a novel steganalytic tactic termed the Imbalanced Maximizing-AUC Proximal Support Vector Machine (PSVM). This method strengthens detection performance in the presence of imbalanced datasets by integrating A…

Clustering-Oriented Generative Attribute Graph Imputation Open

Mulin Chen, Bo-Cheng Wang, J.M. Zhong, Zongcheng Miao, Xuelong Li · 2025

Attribute-missing graph clustering has emerged as a significant unsupervised task, where only attribute vectors of partial nodes are available and the graph structure is intact. The related models generally follow the two-step paradigm of …

SVGen: Interpretable Vector Graphics Generation with Large Language Models Open

Feiyu Wang, Zhiyuan Zhao, Yuandong Liu, Da Zhang, Junyu Gao , et al. · 2025

Scalable Vector Graphics (SVG) is widely used in front-end development and UI/UX design due to its scalability, editability, and rendering efficiency. However, turning creative ideas into precise vector graphics remains a time-consuming ch…

UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding Open

Da Zhang, Chengbo Rong, Bingyu Li, Feiyu Wang, Zhiyuan Zhao , et al. · 2025

Large vision-language models (VLMs) have achieved remarkable success in natural scene understanding, yet their application to underwater environments remains largely unexplored. Underwater imagery presents unique challenges including sever…

Object-AVEdit: An Object-level Audio-Visual Editing Model Open

Youquan Fu, Rui Si, Hongfa Wang, Dongzhan Zhou, Jiacheng Sun , et al. · 2025

There is a high demand for audio-visual editing in video post-production and the film making field. While numerous models have explored audio and video editing, they struggle with object-level audio-visual operations. Specifically, object-…

Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing Open

Bingyu Li, Hui Dong, Da Zhang, Zhiyuan Zhao, Junyu Gao , et al. · 2025

Open-Vocabulary Remote Sensing Image Segmentation (OVRSIS), an emerging task that adapts Open-Vocabulary Segmentation (OVS) to the remote sensing (RS) domain, remains underexplored due to the absence of a unified evaluation benchmark and t…

Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation Open

Pingrui Zhang, Yifei Su, Pengyuan Wu, Dong An, Li Zhang , et al. · 2025

Vision-and-Language Navigation (VLN) requires the agent to navigate by following natural instructions under partial observability, making it difficult to align perception with language. Recent methods mitigate this by imagining future scen…

CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features Open

Xi Feng, Deyu Zhang, Sheng Hu, Xuelong Li, Min Wu , et al. · 2025

Effectively modeling and utilizing spatiotemporal features from RGB and other modalities (\eg, depth, thermal, and event data, denoted as X) is the core of RGB-X tracker design. Existing methods often employ two parallel branches to separa…

AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars Open

T. Zhang, Jian Zhao, Ye Li, Zheng Zhu, Ping Hu , et al. · 2025

Whole-body audio-driven avatar pose and expression generation is a critical task for creating lifelike digital humans and enhancing the capabilities of interactive virtual agents, with wide-ranging applications in virtual reality, digital …

Enhance Vision-Language Alignment with Noise Open

Sida Huang, Hongyuan Zhang, Xuelong Li · 2025

With the advancement of pre-trained vision-language (VL) models, enhancing the alignment between visual and linguistic modalities in downstream tasks has emerged as a critical challenge. Different from existing fine-tuning methods that add…

Why Does Dropping Edges Usually Outperform Adding Edges in Graph Contrastive Learning? Open

Yanchen Xu, Siqi Huang, Hongyuan Zhang, Xuelong Li · 2025

Graph contrastive learning (GCL) has been widely used as an effective self-supervised learning method for graph representation learning. However, how to apply adequate and stable graph augmentation to generating proper views for contrastiv…

Bidirectional Prototype-Reward co-Evolution for Test-Time Adaptation of Vision-Language Models Open

Xiaozhen Qiao, Peng Huang, Jiakang Yuan, Xiang Guo, Bin Ye , et al. · 2025

Test-time adaptation (TTA) is crucial in maintaining performance of Vision Language Models (VLMs) when facing distribution shifts, particularly when the source data or target labels are inaccessible. Existing TTA methods predominantly leve…

NFIG: Multi-Scale Autoregressive Image Generation via Frequency Ordering Open

Zhihao Huang, Xi Qiu, Yue Ma, Ying Zhou, Junjie Chen , et al. · 2025

Autoregressive models have achieved significant success in image generation. However, unlike the inherent hierarchical structure of image information in the spectral domain, standard autoregressive methods typically generate pixels sequent…

AudioSpa: Spatializing Sound Events with Text Open

Linfeng Feng, Zhao Lei, Boyu Zhu, Xiao-Lei Zhang, Xuelong Li · 2025

Text-to-audio (TTA) systems have recently demonstrated strong performance in synthesizing monaural audio from text. However, the task of generating binaural spatial audio from text, which provides a more immersive auditory experience by in…

Lensless fiber endomicroscopic phase imaging using a physical model-driven neural network Open

Yuhui Tang, Bin Zhao, Xinyi Ye, Jiawei Sun, Xuelong Li · 2025

Learning-based lensless fiber endomicroscopic phase imaging through multi-core fibers (MCF) holds great promise for label-free endomicroscopic imaging of biological samples with minimum invasiveness. However, conventional data-driven deep …

Dual-Bounded Nonlinear Optimal Transport for Size Constrained Min Cut Clustering Open

Fangyuan Xie, Jinghui Yuan, Feiping Nie, Xuelong Li · 2025

Min cut is an important graph partitioning method. However, current solutions to the min cut problem suffer from slow speeds, difficulty in solving, and often converge to simple solutions. To address these issues, we relax the min cut prob…

FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation Open

Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li · 2025

Open-vocabulary segmentation aims to identify and segment specific regions and objects based on text-based descriptions. A common solution is to leverage powerful vision-language models (VLMs), such as CLIP, to bridge the gap between visio…

Diffusion-driven lensless fiber endomicroscopic quantitative phase imaging toward digital pathology Open

Zhaoqing Chen, Jiawei Sun, Xibin Yang, Xinyi Ye, Bin Zhao , et al. · 2025

Beyond Similarity: Mutual Information-Guided Retrieval for In-Context Learning in VQA Open

Jun Zhang, Lv Zezhong, Jian Zhao, Yan Wang, Tianle Zhang , et al. · 2025

A Greedy Strategy for Graph Cut Open

Feiping Nie, Shenfei Pei, Zengwei Zheng, Rong Wang, Xuelong Li · 2024

We propose a Greedy strategy to solve the problem of Graph Cut, called GGC. It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters which reduces the value of the global objective fu…

Enhance Vision-Language Alignment with Noise Open

Sida Huang, Hongyuan Zhang, Xuelong Li · 2024

With the advancement of pre-trained vision-language (VL) models, enhancing the alignment between visual and linguistic modalities in downstream tasks has emerged as a critical challenge. Different from existing fine-tuning methods that add…

Why Does Dropping Edges Usually Outperform Adding Edges in Graph Contrastive Learning? Open

Yanchen Xu, Siqi Huang, Hongyuan Zhang, Xuelong Li · 2024

Graph contrastive learning (GCL) has been widely used as an effective self-supervised learning method for graph representation learning. However, how to apply adequate and stable graph augmentation to generating proper views for contrastiv…

Open-Vocabulary Octree-Graph for 3D Scene Understanding Open

Zhigang Wang, Yifei Su, Chenhui Li, Dong Wang, Yan Huang , et al. · 2024

Open-vocabulary 3D scene understanding is indispensable for embodied agents. Recent works leverage pretrained vision-language models (VLMs) for object segmentation and project them to point clouds to build 3D maps. Despite progress, a poin…

Night-to-Day Translation via Illumination Degradation Disentanglement Open

Guanzhou Lan, Yuqi Yang, Zhigang Wang, Dong Wang, Bin Zhao , et al. · 2024

Night-to-Day translation (Night2Day) aims to achieve day-like vision for nighttime scenes. However, processing night images with complex degradations remains a significant challenge under unpaired conditions. Previous methods that uniforml…

Physics in Next-token Prediction Open

Hongjun An, Y Song, Xuelong Li · 2024

We discovered the underlying physics in Next-token Prediction (NTP). We identified the law of information conservation within NTP and proposed the First Law of Information Capacity (IC-1), demonstrating that the essence of intelligence eme…

Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control Open

Xinyi Yuan, Zhong-Xia Shang, Zifan Wang, Chenkai Wang, Shan‐Chao Zhao , et al. · 2024

Diffusion models demonstrate superior performance in capturing complex distributions from large-scale datasets, providing a promising solution for quadrupedal locomotion control. However, the robustness of the diffusion planner is inherent…

SurANet: Surrounding-Aware Network for Concealed Object Detection via Highly-Efficient Interactive Contrastive Learning Strategy Open

Yuhan Kang, Qingpeng Li, Leyuan Fang, Jian Zhao, Xuelong Li · 2024

Concealed object detection (COD) in cluttered scenes is significant for various image processing applications. However, due to that concealed objects are always similar to their background, it is extremely hard to distinguish them. Here, t…

FastUMI: A Scalable and Hardware-Independent Universal Manipulation Interface with Dataset Open

Ziniu Wu, Tianyu Wang, Zhaxizhuoma, Cuntai Guan, Zhiqiang Jia , et al. · 2024

Real-world manipulation data involving robotic arms is crucial for developing generalist action policies, yet such data remains scarce since existing data collection methods are hindered by high costs, hardware dependencies, and complex se…

Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement Open

Guanlin Li, Ke Zhang, Ting Wang, Ming Li, Bin Zhao , et al. · 2024

Despite the impressive advancements made in recent low-light image enhancement techniques, the scarcity of paired data has emerged as a significant obstacle to further advancements. This work proposes a mean-teacher-based semi-supervised l…

COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models Open

Kehui Liu, Zixin Tang, Dong Wang, Zhigang Wang, Bin Zhao , et al. · 2024

Leveraging the powerful reasoning capabilities of large language models (LLMs), recent LLM-based robot task planning methods yield promising results. However, they mainly focus on single or multiple homogeneous robots on simple tasks. Prac…

Xuelong Li YOU? Author Swipe