Huchuan Lu
YOU?
Author Swipe
View article: Intelligent Decision‐Making Driven by Large AI Models: Progress, Challenges and Prospects
Intelligent Decision‐Making Driven by Large AI Models: Progress, Challenges and Prospects Open
With the rapid development of large AI models, large decision models have further broken through the limits of human cognition and promoted the innovation of decision‐making paradigms in extensive fields such as medicine and transportation…
View article: Regularizing Subspace Redundancy of Low-Rank Adaptation
Regularizing Subspace Redundancy of Low-Rank Adaptation Open
Low-Rank Adaptation (LoRA) and its variants have delivered strong capability in Parameter-Efficient Transfer Learning (PETL) by minimizing trainable parameters and benefiting from reparameterization. However, their projection matrices rema…
View article: Complementary and Contrastive Learning for Audio-Visual Segmentation
Complementary and Contrastive Learning for Audio-Visual Segmentation Open
Audio-Visual Segmentation (AVS) aims to generate pixel-wise segmentation maps that correlate with the auditory signals of objects. This field has seen significant progress with numerous CNN and Transformer-based methods enhancing the segme…
View article: What Makes You Unique? Attribute Prompt Composition for Object Re-Identification
What Makes You Unique? Attribute Prompt Composition for Object Re-Identification Open
Object Re-IDentification (ReID) aims to recognize individuals across non-overlapping camera views. While recent advances have achieved remarkable progress, most existing models are constrained to either single-domain or cross-domain scenar…
View article: UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation
UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation Open
Multi-modal image segmentation faces real-world deployment challenges from incomplete/corrupted modalities degrading performance. While existing methods address training-inference modality gaps via specialized per-combination models, they …
View article: Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking
Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking Open
While autoregressive (AR) models have demonstrated remarkable success in image generation, extending them to layout-conditioned generation remains challenging due to the sparse nature of layout conditions and the risk of feature entangleme…
View article: Power Battery Detection
Power Battery Detection Open
Power batteries are essential components in electric vehicles, where internal structural defects can pose serious safety risks. We conduct a comprehensive study on a new task, power battery detection (PBD), which aims to localize the dense…
View article: Underwater Optical Object Detection in the Era of Artificial Intelligence: Current, Challenge, and Future
Underwater Optical Object Detection in the Era of Artificial Intelligence: Current, Challenge, and Future Open
Underwater optical object detection (UOD), aiming at identifying and localising objects in underwater optical images or videos, presents significant challenges due to the optical distortion, water turbidity, and changing illumination in un…
View article: Referring Remote Sensing Image Segmentation with Cross-view Semantics Interaction Network
Referring Remote Sensing Image Segmentation with Cross-view Semantics Interaction Network Open
Recently, Referring Remote Sensing Image Segmentation (RRSIS) has aroused wide attention. To handle drastic scale variation of remote targets, existing methods only use the full image as input and nest the saliency-preferring techniques of…
View article: UniSegDiff: Boosting Unified Lesion Segmentation via a Staged Diffusion Model
UniSegDiff: Boosting Unified Lesion Segmentation via a Staged Diffusion Model Open
The Diffusion Probabilistic Model (DPM) has demonstrated remarkable performance across a variety of generative tasks. The inherent randomness in diffusion models helps address issues such as blurring at the edges of medical images and labe…
View article: GFM-Planner: Perception-Aware Trajectory Planning with Geometric Feature Metric
GFM-Planner: Perception-Aware Trajectory Planning with Geometric Feature Metric Open
Like humans who rely on landmarks for orientation, autonomous robots depend on feature-rich environments for accurate localization. In this paper, we propose the GFM-Planner, a perception-aware trajectory planning framework based on the ge…
View article: Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking
Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking Open
Transformer-based visual trackers have demonstrated significant advancements due to their powerful modeling capabilities. However, their practicality is limited on resource-constrained devices because of their slow processing speeds. To ad…
View article: Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking
Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking Open
Transformer-based visual trackers have demonstrated significant advancements due to their powerful modeling capabilities. However, their practicality is limited on resource-constrained devices because of their slow processing speeds. To ad…
View article: Research progress of deubiquitinating enzymes in cerebral ischemia-reperfusion injury
Research progress of deubiquitinating enzymes in cerebral ischemia-reperfusion injury Open
Cerebral ischemia-reperfusion injury (CIRI) is a critical pathological process driving neurological deterioration following ischemic stroke, involving multifaceted mechanisms such as inflammatory cascades, oxidative stress, and programmed …
View article: CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting
CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting Open
Recent advances in 3D reconstruction techniques and vision-language models have fueled significant progress in 3D semantic understanding, a capability critical to robotics, autonomous driving, and virtual/augmented reality. However, method…
View article: P3Net: Progressive and Periodic Perturbation for Semi-Supervised Medical Image Segmentation
P3Net: Progressive and Periodic Perturbation for Semi-Supervised Medical Image Segmentation Open
Perturbation with diverse unlabeled data has proven beneficial for semi-supervised medical image segmentation (SSMIS). While many works have successfully used various perturbation techniques, a deeper understanding of learning perturbation…
View article: SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification
SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification Open
Aerial-Ground Person Re-IDentification (AG-ReID) aims to retrieve specific persons across cameras with different viewpoints. Previous works focus on designing discriminative models to maintain the identity consistency despite drastic chang…
View article: CLIMB-ReID: A Hybrid CLIP-Mamba Framework for Person Re-Identification
CLIMB-ReID: A Hybrid CLIP-Mamba Framework for Person Re-Identification Open
Person Re-IDentification (ReID) aims to identify specific persons from non-overlapping cameras. Recently, some works have suggested using large-scale pre-trained vision-language models like CLIP to boost ReID performance. Unfortunately, ex…
View article: Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding
Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding Open
Injecting semantics into 3D Gaussian Splatting (3DGS) has recently garnered significant attention. While current approaches typically distill 3D semantic features from 2D foundational models (e.g., CLIP and SAM) to facilitate novel view se…
View article: SUTrack: Towards Simple and Unified Single Object Tracking
SUTrack: Towards Simple and Unified Single Object Tracking Open
In this paper, we propose a simple yet unified single object tracking (SOT) framework, dubbed SUTrack. It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a s…
View article: Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking
Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking Open
Efficient tracking has garnered attention for its ability to operate on resource-constrained platforms for real-world deployment beyond desktop GPUs. Current efficient trackers mainly follow precision-oriented trackers, adopting a one-stre…
View article: MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt
MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt Open
Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary image information from different modalities. Recently, large-scale pre-trained models like CLIP have demonstrated impressive performan…
View article: DefMamba: Deformable Visual State Space Model
DefMamba: Deformable Visual State Space Model Open
Recently, state space models (SSM), particularly Mamba, have attracted significant attention from scholars due to their ability to effectively balance computational efficiency and performance. However, most existing visual Mamba methods fl…
View article: IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification Open
Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary information from various modalities. However, existing methods focus on fusing heterogeneous visual features, neglecting the potential…
View article: Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking
Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking Open
Efficient tracking has garnered attention for its ability to operate on resource-constrained platforms for real-world deployment beyond desktop GPUs. Current efficient trackers mainly follow precision-oriented trackers, adopting a one-stre…
View article: EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models Open
Existing encoder-free vision-language models (VLMs) are rapidly narrowing the performance gap with their encoder-based counterparts, highlighting the promising potential for unified multimodal systems with structural simplicity and efficie…
View article: Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge Open
Recent advances in Large Language Models (LLMs) have enabled the development of Video-LLMs, advancing multimodal learning by bridging video data with language tasks. However, current video understanding models struggle with processing long…
View article: AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation Open
The essence of audio-visual segmentation (AVS) lies in locating and delineating sound-emitting objects within a video stream. While Transformer-based methods have shown promise, their handling of long-range dependencies struggles due to qu…