Explanipedia

Regularizing Subspace Redundancy of Low-Rank Adaptation Open

Yue Zhu, Haiwen Diao, Shang Gao, Jiazuo Yu, Jiawen Zhu , et al. · 2025

Low-Rank Adaptation (LoRA) and its variants have delivered strong capability in Parameter-Efficient Transfer Learning (PETL) by minimizing trainable parameters and benefiting from reparameterization. However, their projection matrices rema…

What Makes You Unique? Attribute Prompt Composition for Object Re-Identification Open

Yingquan Wang, Pingping Zhang, Chong Sun, Dong Wang, Huchuan Lu · 2025

Object Re-IDentification (ReID) aims to recognize individuals across non-overlapping camera views. While recent advances have achieved remarkable progress, most existing models are constrained to either single-domain or cross-domain scenar…

UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation Open

Xiaoqi Zhao, Youwei Pang, Chenyang Yu, Lihe Zhang, Huchuan Lu , et al. · 2025

Multi-modal image segmentation faces real-world deployment challenges from incomplete/corrupted modalities degrading performance. While existing methods address training-inference modality gaps via specialized per-combination models, they …

Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking Open

Zirui Zheng, Takashi Isobe, Tong Shen, Jia Xu, Jianbin Zhao , et al. · 2025

While autoregressive (AR) models have demonstrated remarkable success in image generation, extending them to layout-conditioned generation remains challenging due to the sparse nature of layout conditions and the risk of feature entangleme…

Underwater Optical Object Detection in the Era of Artificial Intelligence: Current, Challenge, and Future Open

Long Chen, Yuzhi Huang, Junyu Dong, Qi Xu, Sam Kwong , et al. · 2025

Underwater optical object detection (UOD), aiming at identifying and localising objects in underwater optical images or videos, presents significant challenges due to the optical distortion, water turbidity, and changing illumination in un…

UniSegDiff: Boosting Unified Lesion Segmentation via a Staged Diffusion Model Open

Yilong Hu, Shijie Chang, Lihe Zhang, Tian Feng, Weibing Sun , et al. · 2025

The Diffusion Probabilistic Model (DPM) has demonstrated remarkable performance across a variety of generative tasks. The inherent randomness in diffusion models helps address issues such as blurring at the edges of medical images and labe…

CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation Open

Qinghe Wang, Yawen Luo, Xiaoyu Shi, Xu Jia, Huchuan Lu , et al. · 2025

GFM-Planner: Perception-Aware Trajectory Planning with Geometric Feature Metric Open

Lin Yue, Xiaoxuan Zhang, Yang Liu, Dong Wang, Huchuan Lu · 2025

Like humans who rely on landmarks for orientation, autonomous robots depend on feature-rich environments for accurate localization. In this paper, we propose the GFM-Planner, a perception-aware trajectory planning framework based on the ge…

Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking Open

Ben Kang, Xin Chen, Jie Zhao, Chunjuan Bo, Dong Wang , et al. · 2025

Transformer-based visual trackers have demonstrated significant advancements due to their powerful modeling capabilities. However, their practicality is limited on resource-constrained devices because of their slow processing speeds. To ad…

Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking Open

Ben Kang, Xin Chen, Jie Zhao, Chunjuan Bo, Dong Wang , et al. · 2025

Transformer-based visual trackers have demonstrated significant advancements due to their powerful modeling capabilities. However, their practicality is limited on resource-constrained devices because of their slow processing speeds. To ad…

Research progress of deubiquitinating enzymes in cerebral ischemia-reperfusion injury Open

Xiaohong Qin, Jing‐Ning Zhu, Huchuan Lu, Meihui Yi, Zheng Zhao , et al. · 2025

Cerebral ischemia-reperfusion injury (CIRI) is a critical pathological process driving neurological deterioration following ischemic stroke, involving multifaceted mechanisms such as inflammatory cascades, oxidative stress, and programmed …

CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting Open

Lei Tian, Xiaomin Li, Liqian Ma, Hao Yin, Zirui Zheng , et al. · 2025

Recent advances in 3D reconstruction techniques and vision-language models have fueled significant progress in 3D semantic understanding, a capability critical to robotics, autonomous driving, and virtual/augmented reality. However, method…

The Visual Object Tracking VOT2015 challenge results Open

Matej Kristan, Aleš Leonardis, Jǐŕı Matas, Michael Felsberg, Roman Pflugfelder , et al. · 2025

The Visual Object Tracking challenge 2014, VOT2014, aims at comparing short-term single-object visual trackers that do not ap-ply pre-learned models of object appearance. Results of 38 trackers are 2 Authors Suppressed Due to Excessive Len…

SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification Open

Xiangyun Hu, Pingping Zhang, Yuhao Wang, Bin Yan, Huchuan Lu · 2025

Aerial-Ground Person Re-IDentification (AG-ReID) aims to retrieve specific persons across cameras with different viewpoints. Previous works focus on designing discriminative models to maintain the identity consistency despite drastic chang…

CLIMB-ReID: A Hybrid CLIP-Mamba Framework for Person Re-Identification Open

Chenyang Yu, Xuehu Liu, Jiawen Zhu, Yuhao Wang, Pingping Zhang , et al. · 2025

Person Re-IDentification (ReID) aims to identify specific persons from non-overlapping cameras. Recently, some works have suggested using large-scale pre-trained vision-language models like CLIP to boost ReID performance. Unfortunately, ex…

Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding Open

Wenbo Zhang, Lu Zhang, Ping Hu, Liqian Ma, Yunzhi Zhuge , et al. · 2025

Injecting semantics into 3D Gaussian Splatting (3DGS) has recently garnered significant attention. While current approaches typically distill 3D semantic features from 2D foundational models (e.g., CLIP and SAM) to facilitate novel view se…

SUTrack: Towards Simple and Unified Single Object Tracking Open

Xin Chen, Ben Kang, Wanting Geng, Jiawen Zhu, Yi Liu , et al. · 2025

In this paper, we propose a simple yet unified single object tracking (SOT) framework, dubbed SUTrack. It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a s…

Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking Open

Jiawen Zhu, Huayi Tang, Xin Chen, Xinying Wang, Dong Wang , et al. · 2025

Efficient tracking has garnered attention for its ability to operate on resource-constrained platforms for real-world deployment beyond desktop GPUs. Current efficient trackers mainly follow precision-oriented trackers, adopting a one-stre…

MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt Open

Yuhao Wang, Xuehu Liu, Tianyu Yan, Yang Liu, Aihua Zheng , et al. · 2025

Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary image information from different modalities. Recently, large-scale pre-trained models like CLIP have demonstrated impressive performan…

IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification Open

Yuhao Wang, Yongfeng Lv, Pingping Zhang, Huchuan Lu · 2025

Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary information from various modalities. However, existing methods focus on fusing heterogeneous visual features, neglecting the potential…

Self-calibrated region-level regression for crowd counting Open

Jiawen Zhu, Wenda Zhao, You He, Huchuan Lu · 2025

Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking Open

Jiawen Zhu, Huayi Tang, Xin Chen, Xinying Wang, Dong Wang , et al. · 2025

Efficient tracking has garnered attention for its ability to operate on resource-constrained platforms for real-world deployment beyond desktop GPUs. Current efficient trackers mainly follow precision-oriented trackers, adopting a one-stre…

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models Open

Haiwen Diao, Xiaotong Li, Yufeng Cui, Yueze Wang, Haoge Deng , et al. · 2025

Existing encoder-free vision-language models (VLMs) are rapidly narrowing the performance gap with their encoder-based counterparts, highlighting the promising potential for unified multimodal systems with structural simplicity and efficie…

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge Open

Hongyan Xiong, Zongxin Yang, Jiazuo Yu, Yunzhi Zhuge, Pengfei Zhang , et al. · 2025

Recent advances in Large Language Models (LLMs) have enabled the development of Video-LLMs, advancing multimodal learning by bridging video data with language tasks. However, current video understanding models struggle with processing long…

AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation Open

Shilei Gong, Yunzhi Zhuge, Pengfei Zhang, Yifan Wang, Pingping Zhang , et al. · 2025

The essence of audio-visual segmentation (AVS) lies in locating and delineating sound-emitting objects within a video stream. While Transformer-based methods have shown promise, their handling of long-range dependencies struggles due to qu…

3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding Open

Hongyan Xiong, Yunzhi Zhuge, Jiawen Zhu, Pengfei Zhang, Huchuan Lu · 2025

Multi-modal Large Language Models (MLLMs) exhibit impressive capabilities in 2D tasks, yet encounter challenges in discerning the spatial positions, interrelations, and causal logic in scenes when transitioning from 2D to 3D representation…

Spatial-temporal initialization dilemma: towards realistic visual tracking Open

Chang Liu, Yongsheng Yuan, Xin Chen, Huchuan Lu, Dong Wang · 2024

In this paper, we first investigate the phenomenon of the spatial-temporal initialization dilemma towards realistic visual tracking, which may adversely affect tracking performance. We summarize the aforementioned phenomenon by comparing d…

ReNeg: Learning Negative Embedding with Reward Guidance Open

Xiaomin Li, Yixuan Liu, Takashi Isobe, Xu Jia, Qinpeng Cui , et al. · 2024

In text-to-image (T2I) generation applications, negative embeddings have proven to be a simple yet effective approach for enhancing generation quality. Typically, these negative embeddings are derived from user-defined negative prompts, wh…

SUTrack: Towards Simple and Unified Single Object Tracking Open

Xin Chen, Ben Kang, Wanting Geng, Jiawen Zhu, Yi Liu , et al. · 2024

In this paper, we propose a simple yet unified single object tracking (SOT) framework, dubbed SUTrack. It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a s…

Unity is Strength: Unifying Convolutional and Transformeral Features for Better Person Re-Identification Open

Yuhao Wang, Pingping Zhang, Xuehu Liu, Zhengzheng Tu, Huchuan Lu · 2024

Person Re-identification (ReID) aims to retrieve the specific person across non-overlapping cameras, which greatly helps intelligent transportation systems. As we all know, Convolutional Neural Networks (CNNs) and Transformers have the uni…

Huchuan Lu YOU? Author Swipe