Huchuan Lu
YOU?
Author Swipe
View article: Regularizing Subspace Redundancy of Low-Rank Adaptation
Regularizing Subspace Redundancy of Low-Rank Adaptation Open
Low-Rank Adaptation (LoRA) and its variants have delivered strong capability in Parameter-Efficient Transfer Learning (PETL) by minimizing trainable parameters and benefiting from reparameterization. However, their projection matrices rema…
View article: What Makes You Unique? Attribute Prompt Composition for Object Re-Identification
What Makes You Unique? Attribute Prompt Composition for Object Re-Identification Open
Object Re-IDentification (ReID) aims to recognize individuals across non-overlapping camera views. While recent advances have achieved remarkable progress, most existing models are constrained to either single-domain or cross-domain scenar…
View article: UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation
UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation Open
Multi-modal image segmentation faces real-world deployment challenges from incomplete/corrupted modalities degrading performance. While existing methods address training-inference modality gaps via specialized per-combination models, they …
View article: Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking
Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking Open
While autoregressive (AR) models have demonstrated remarkable success in image generation, extending them to layout-conditioned generation remains challenging due to the sparse nature of layout conditions and the risk of feature entangleme…
View article: Underwater Optical Object Detection in the Era of Artificial Intelligence: Current, Challenge, and Future
Underwater Optical Object Detection in the Era of Artificial Intelligence: Current, Challenge, and Future Open
Underwater optical object detection (UOD), aiming at identifying and localising objects in underwater optical images or videos, presents significant challenges due to the optical distortion, water turbidity, and changing illumination in un…
View article: UniSegDiff: Boosting Unified Lesion Segmentation via a Staged Diffusion Model
UniSegDiff: Boosting Unified Lesion Segmentation via a Staged Diffusion Model Open
The Diffusion Probabilistic Model (DPM) has demonstrated remarkable performance across a variety of generative tasks. The inherent randomness in diffusion models helps address issues such as blurring at the edges of medical images and labe…
View article: CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation
CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation Open
View article: GFM-Planner: Perception-Aware Trajectory Planning with Geometric Feature Metric
GFM-Planner: Perception-Aware Trajectory Planning with Geometric Feature Metric Open
Like humans who rely on landmarks for orientation, autonomous robots depend on feature-rich environments for accurate localization. In this paper, we propose the GFM-Planner, a perception-aware trajectory planning framework based on the ge…
View article: Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking
Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking Open
Transformer-based visual trackers have demonstrated significant advancements due to their powerful modeling capabilities. However, their practicality is limited on resource-constrained devices because of their slow processing speeds. To ad…
View article: Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking
Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking Open
Transformer-based visual trackers have demonstrated significant advancements due to their powerful modeling capabilities. However, their practicality is limited on resource-constrained devices because of their slow processing speeds. To ad…
View article: Research progress of deubiquitinating enzymes in cerebral ischemia-reperfusion injury
Research progress of deubiquitinating enzymes in cerebral ischemia-reperfusion injury Open
Cerebral ischemia-reperfusion injury (CIRI) is a critical pathological process driving neurological deterioration following ischemic stroke, involving multifaceted mechanisms such as inflammatory cascades, oxidative stress, and programmed …
View article: CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting
CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting Open
Recent advances in 3D reconstruction techniques and vision-language models have fueled significant progress in 3D semantic understanding, a capability critical to robotics, autonomous driving, and virtual/augmented reality. However, method…
View article: The Visual Object Tracking VOT2015 challenge results
The Visual Object Tracking VOT2015 challenge results Open
The Visual Object Tracking challenge 2014, VOT2014, aims at comparing short-term single-object visual trackers that do not ap-ply pre-learned models of object appearance. Results of 38 trackers are 2 Authors Suppressed Due to Excessive Len…
View article: SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification
SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification Open
Aerial-Ground Person Re-IDentification (AG-ReID) aims to retrieve specific persons across cameras with different viewpoints. Previous works focus on designing discriminative models to maintain the identity consistency despite drastic chang…
View article: CLIMB-ReID: A Hybrid CLIP-Mamba Framework for Person Re-Identification
CLIMB-ReID: A Hybrid CLIP-Mamba Framework for Person Re-Identification Open
Person Re-IDentification (ReID) aims to identify specific persons from non-overlapping cameras. Recently, some works have suggested using large-scale pre-trained vision-language models like CLIP to boost ReID performance. Unfortunately, ex…
View article: Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding
Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding Open
Injecting semantics into 3D Gaussian Splatting (3DGS) has recently garnered significant attention. While current approaches typically distill 3D semantic features from 2D foundational models (e.g., CLIP and SAM) to facilitate novel view se…
View article: SUTrack: Towards Simple and Unified Single Object Tracking
SUTrack: Towards Simple and Unified Single Object Tracking Open
In this paper, we propose a simple yet unified single object tracking (SOT) framework, dubbed SUTrack. It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a s…
View article: Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking
Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking Open
Efficient tracking has garnered attention for its ability to operate on resource-constrained platforms for real-world deployment beyond desktop GPUs. Current efficient trackers mainly follow precision-oriented trackers, adopting a one-stre…
View article: MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt
MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt Open
Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary image information from different modalities. Recently, large-scale pre-trained models like CLIP have demonstrated impressive performan…
View article: IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification Open
Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary information from various modalities. However, existing methods focus on fusing heterogeneous visual features, neglecting the potential…
View article: Self-calibrated region-level regression for crowd counting
Self-calibrated region-level regression for crowd counting Open
View article: Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking
Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking Open
Efficient tracking has garnered attention for its ability to operate on resource-constrained platforms for real-world deployment beyond desktop GPUs. Current efficient trackers mainly follow precision-oriented trackers, adopting a one-stre…
View article: EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models Open
Existing encoder-free vision-language models (VLMs) are rapidly narrowing the performance gap with their encoder-based counterparts, highlighting the promising potential for unified multimodal systems with structural simplicity and efficie…
View article: Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge Open
Recent advances in Large Language Models (LLMs) have enabled the development of Video-LLMs, advancing multimodal learning by bridging video data with language tasks. However, current video understanding models struggle with processing long…
View article: AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation Open
The essence of audio-visual segmentation (AVS) lies in locating and delineating sound-emitting objects within a video stream. While Transformer-based methods have shown promise, their handling of long-range dependencies struggles due to qu…
View article: 3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding Open
Multi-modal Large Language Models (MLLMs) exhibit impressive capabilities in 2D tasks, yet encounter challenges in discerning the spatial positions, interrelations, and causal logic in scenes when transitioning from 2D to 3D representation…
View article: Spatial-temporal initialization dilemma: towards realistic visual tracking
Spatial-temporal initialization dilemma: towards realistic visual tracking Open
In this paper, we first investigate the phenomenon of the spatial-temporal initialization dilemma towards realistic visual tracking, which may adversely affect tracking performance. We summarize the aforementioned phenomenon by comparing d…
View article: ReNeg: Learning Negative Embedding with Reward Guidance
ReNeg: Learning Negative Embedding with Reward Guidance Open
In text-to-image (T2I) generation applications, negative embeddings have proven to be a simple yet effective approach for enhancing generation quality. Typically, these negative embeddings are derived from user-defined negative prompts, wh…
View article: SUTrack: Towards Simple and Unified Single Object Tracking
SUTrack: Towards Simple and Unified Single Object Tracking Open
In this paper, we propose a simple yet unified single object tracking (SOT) framework, dubbed SUTrack. It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a s…
View article: Unity is Strength: Unifying Convolutional and Transformeral Features for Better Person Re-Identification
Unity is Strength: Unifying Convolutional and Transformeral Features for Better Person Re-Identification Open
Person Re-identification (ReID) aims to retrieve the specific person across non-overlapping cameras, which greatly helps intelligent transportation systems. As we all know, Convolutional Neural Networks (CNNs) and Transformers have the uni…