Xuelong Li
YOU?
Author Swipe
View article: Imbalanced learning using the area under the curve and proximal support vector machine for image steganalysis
Imbalanced learning using the area under the curve and proximal support vector machine for image steganalysis Open
The proposed research introduces a novel steganalytic tactic termed the Imbalanced Maximizing-AUC Proximal Support Vector Machine (PSVM). This method strengthens detection performance in the presence of imbalanced datasets by integrating A…
View article: Clustering-Oriented Generative Attribute Graph Imputation
Clustering-Oriented Generative Attribute Graph Imputation Open
Attribute-missing graph clustering has emerged as a significant unsupervised task, where only attribute vectors of partial nodes are available and the graph structure is intact. The related models generally follow the two-step paradigm of …
View article: SVGen: Interpretable Vector Graphics Generation with Large Language Models
SVGen: Interpretable Vector Graphics Generation with Large Language Models Open
Scalable Vector Graphics (SVG) is widely used in front-end development and UI/UX design due to its scalability, editability, and rendering efficiency. However, turning creative ideas into precise vector graphics remains a time-consuming ch…
View article: UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding
UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding Open
Large vision-language models (VLMs) have achieved remarkable success in natural scene understanding, yet their application to underwater environments remains largely unexplored. Underwater imagery presents unique challenges including sever…
View article: Object-AVEdit: An Object-level Audio-Visual Editing Model
Object-AVEdit: An Object-level Audio-Visual Editing Model Open
There is a high demand for audio-visual editing in video post-production and the film making field. While numerous models have explored audio and video editing, they struggle with object-level audio-visual operations. Specifically, object-…
View article: Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing
Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing Open
Open-Vocabulary Remote Sensing Image Segmentation (OVRSIS), an emerging task that adapts Open-Vocabulary Segmentation (OVS) to the remote sensing (RS) domain, remains underexplored due to the absence of a unified evaluation benchmark and t…
View article: Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation
Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation Open
Vision-and-Language Navigation (VLN) requires the agent to navigate by following natural instructions under partial observability, making it difficult to align perception with language. Recent methods mitigate this by imagining future scen…
View article: CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features
CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features Open
Effectively modeling and utilizing spatiotemporal features from RGB and other modalities (\eg, depth, thermal, and event data, denoted as X) is the core of RGB-X tracker design. Existing methods often employ two parallel branches to separa…
View article: AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars
AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars Open
Whole-body audio-driven avatar pose and expression generation is a critical task for creating lifelike digital humans and enhancing the capabilities of interactive virtual agents, with wide-ranging applications in virtual reality, digital …
View article: Enhance Vision-Language Alignment with Noise
Enhance Vision-Language Alignment with Noise Open
With the advancement of pre-trained vision-language (VL) models, enhancing the alignment between visual and linguistic modalities in downstream tasks has emerged as a critical challenge. Different from existing fine-tuning methods that add…
View article: Why Does Dropping Edges Usually Outperform Adding Edges in Graph Contrastive Learning?
Why Does Dropping Edges Usually Outperform Adding Edges in Graph Contrastive Learning? Open
Graph contrastive learning (GCL) has been widely used as an effective self-supervised learning method for graph representation learning. However, how to apply adequate and stable graph augmentation to generating proper views for contrastiv…
View article: Bidirectional Prototype-Reward co-Evolution for Test-Time Adaptation of Vision-Language Models
Bidirectional Prototype-Reward co-Evolution for Test-Time Adaptation of Vision-Language Models Open
Test-time adaptation (TTA) is crucial in maintaining performance of Vision Language Models (VLMs) when facing distribution shifts, particularly when the source data or target labels are inaccessible. Existing TTA methods predominantly leve…
View article: NFIG: Multi-Scale Autoregressive Image Generation via Frequency Ordering
NFIG: Multi-Scale Autoregressive Image Generation via Frequency Ordering Open
Autoregressive models have achieved significant success in image generation. However, unlike the inherent hierarchical structure of image information in the spectral domain, standard autoregressive methods typically generate pixels sequent…
View article: AudioSpa: Spatializing Sound Events with Text
AudioSpa: Spatializing Sound Events with Text Open
Text-to-audio (TTA) systems have recently demonstrated strong performance in synthesizing monaural audio from text. However, the task of generating binaural spatial audio from text, which provides a more immersive auditory experience by in…
View article: Lensless fiber endomicroscopic phase imaging using a physical model-driven neural network
Lensless fiber endomicroscopic phase imaging using a physical model-driven neural network Open
Learning-based lensless fiber endomicroscopic phase imaging through multi-core fibers (MCF) holds great promise for label-free endomicroscopic imaging of biological samples with minimum invasiveness. However, conventional data-driven deep …
View article: Dual-Bounded Nonlinear Optimal Transport for Size Constrained Min Cut Clustering
Dual-Bounded Nonlinear Optimal Transport for Size Constrained Min Cut Clustering Open
Min cut is an important graph partitioning method. However, current solutions to the min cut problem suffer from slow speeds, difficulty in solving, and often converge to simple solutions. To address these issues, we relax the min cut prob…
View article: FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation
FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation Open
Open-vocabulary segmentation aims to identify and segment specific regions and objects based on text-based descriptions. A common solution is to leverage powerful vision-language models (VLMs), such as CLIP, to bridge the gap between visio…
View article: Diffusion-driven lensless fiber endomicroscopic quantitative phase imaging toward digital pathology
Diffusion-driven lensless fiber endomicroscopic quantitative phase imaging toward digital pathology Open
View article: Beyond Similarity: Mutual Information-Guided Retrieval for In-Context Learning in VQA
Beyond Similarity: Mutual Information-Guided Retrieval for In-Context Learning in VQA Open
View article: A Greedy Strategy for Graph Cut
A Greedy Strategy for Graph Cut Open
We propose a Greedy strategy to solve the problem of Graph Cut, called GGC. It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters which reduces the value of the global objective fu…
View article: Enhance Vision-Language Alignment with Noise
Enhance Vision-Language Alignment with Noise Open
With the advancement of pre-trained vision-language (VL) models, enhancing the alignment between visual and linguistic modalities in downstream tasks has emerged as a critical challenge. Different from existing fine-tuning methods that add…
View article: Why Does Dropping Edges Usually Outperform Adding Edges in Graph Contrastive Learning?
Why Does Dropping Edges Usually Outperform Adding Edges in Graph Contrastive Learning? Open
Graph contrastive learning (GCL) has been widely used as an effective self-supervised learning method for graph representation learning. However, how to apply adequate and stable graph augmentation to generating proper views for contrastiv…
View article: Open-Vocabulary Octree-Graph for 3D Scene Understanding
Open-Vocabulary Octree-Graph for 3D Scene Understanding Open
Open-vocabulary 3D scene understanding is indispensable for embodied agents. Recent works leverage pretrained vision-language models (VLMs) for object segmentation and project them to point clouds to build 3D maps. Despite progress, a poin…
View article: Night-to-Day Translation via Illumination Degradation Disentanglement
Night-to-Day Translation via Illumination Degradation Disentanglement Open
Night-to-Day translation (Night2Day) aims to achieve day-like vision for nighttime scenes. However, processing night images with complex degradations remains a significant challenge under unpaired conditions. Previous methods that uniforml…
View article: Physics in Next-token Prediction
Physics in Next-token Prediction Open
We discovered the underlying physics in Next-token Prediction (NTP). We identified the law of information conservation within NTP and proposed the First Law of Information Capacity (IC-1), demonstrating that the essence of intelligence eme…
View article: Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control
Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control Open
Diffusion models demonstrate superior performance in capturing complex distributions from large-scale datasets, providing a promising solution for quadrupedal locomotion control. However, the robustness of the diffusion planner is inherent…
View article: SurANet: Surrounding-Aware Network for Concealed Object Detection via Highly-Efficient Interactive Contrastive Learning Strategy
SurANet: Surrounding-Aware Network for Concealed Object Detection via Highly-Efficient Interactive Contrastive Learning Strategy Open
Concealed object detection (COD) in cluttered scenes is significant for various image processing applications. However, due to that concealed objects are always similar to their background, it is extremely hard to distinguish them. Here, t…
View article: FastUMI: A Scalable and Hardware-Independent Universal Manipulation Interface with Dataset
FastUMI: A Scalable and Hardware-Independent Universal Manipulation Interface with Dataset Open
Real-world manipulation data involving robotic arms is crucial for developing generalist action policies, yet such data remains scarce since existing data collection methods are hindered by high costs, hardware dependencies, and complex se…
View article: Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement
Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement Open
Despite the impressive advancements made in recent low-light image enhancement techniques, the scarcity of paired data has emerged as a significant obstacle to further advancements. This work proposes a mean-teacher-based semi-supervised l…
View article: COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models
COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models Open
Leveraging the powerful reasoning capabilities of large language models (LLMs), recent LLM-based robot task planning methods yield promising results. However, they mainly focus on single or multiple homogeneous robots on simple tasks. Prac…