Explanipedia

Intelligent Decision‐Making Driven by Large AI Models: Progress, Challenges and Prospects Open

You He, Shulan Ruan, Dong Wang, Huchuan Lu, Zhi Li , et al. · 2025

With the rapid development of large AI models, large decision models have further broken through the limits of human cognition and promoted the innovation of decision‐making paradigms in extensive fields such as medicine and transportation…

Regularizing Subspace Redundancy of Low-Rank Adaptation Open

Yue Zhu, Haiwen Diao, Shang Gao, Jiazuo Yu, Jiawen Zhu , et al. · 2025

Low-Rank Adaptation (LoRA) and its variants have delivered strong capability in Parameter-Efficient Transfer Learning (PETL) by minimizing trainable parameters and benefiting from reparameterization. However, their projection matrices rema…

Complementary and Contrastive Learning for Audio-Visual Segmentation Open

Shaogang Gong, Yunzhi Zhuge, Lu Zhang, Pingping Zhang, Huchuan Lu · 2025

Audio-Visual Segmentation (AVS) aims to generate pixel-wise segmentation maps that correlate with the auditory signals of objects. This field has seen significant progress with numerous CNN and Transformer-based methods enhancing the segme…

What Makes You Unique? Attribute Prompt Composition for Object Re-Identification Open

Yingquan Wang, Pingping Zhang, Chong Sun, Dong Wang, Huchuan Lu · 2025

Object Re-IDentification (ReID) aims to recognize individuals across non-overlapping camera views. While recent advances have achieved remarkable progress, most existing models are constrained to either single-domain or cross-domain scenar…

UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation Open

Xiaoqi Zhao, Youwei Pang, Chenyang Yu, Lihe Zhang, Huchuan Lu , et al. · 2025

Multi-modal image segmentation faces real-world deployment challenges from incomplete/corrupted modalities degrading performance. While existing methods address training-inference modality gaps via specialized per-combination models, they …

Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking Open

Zirui Zheng, Takashi Isobe, Tong Shen, Jia Xu, Jianbin Zhao , et al. · 2025

While autoregressive (AR) models have demonstrated remarkable success in image generation, extending them to layout-conditioned generation remains challenging due to the sparse nature of layout conditions and the risk of feature entangleme…

Power Battery Detection Open

Xiaoqi Zhao, Ping Cao, Zonglei Feng, Lihe Zhang, Hanqi Liu , et al. · 2025

Power batteries are essential components in electric vehicles, where internal structural defects can pose serious safety risks. We conduct a comprehensive study on a new task, power battery detection (PBD), which aims to localize the dense…

Underwater Optical Object Detection in the Era of Artificial Intelligence: Current, Challenge, and Future Open

Long Chen, Yuzhi Huang, Junyu Dong, Qi Xu, Sam Kwong , et al. · 2025

Underwater optical object detection (UOD), aiming at identifying and localising objects in underwater optical images or videos, presents significant challenges due to the optical distortion, water turbidity, and changing illumination in un…

Referring Remote Sensing Image Segmentation with Cross-view Semantics Interaction Network Open

Jiaxing Yang, Lihe Zhang, Huchuan Lu · 2025

Recently, Referring Remote Sensing Image Segmentation (RRSIS) has aroused wide attention. To handle drastic scale variation of remote targets, existing methods only use the full image as input and nest the saliency-preferring techniques of…

UniSegDiff: Boosting Unified Lesion Segmentation via a Staged Diffusion Model Open

Yilong Hu, Shijie Chang, Lihe Zhang, Tian Feng, Weibing Sun , et al. · 2025

The Diffusion Probabilistic Model (DPM) has demonstrated remarkable performance across a variety of generative tasks. The inherent randomness in diffusion models helps address issues such as blurring at the edges of medical images and labe…

GFM-Planner: Perception-Aware Trajectory Planning with Geometric Feature Metric Open

Lin Yue, Xiaoxuan Zhang, Yang Liu, Dong Wang, Huchuan Lu · 2025

Like humans who rely on landmarks for orientation, autonomous robots depend on feature-rich environments for accurate localization. In this paper, we propose the GFM-Planner, a perception-aware trajectory planning framework based on the ge…

Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking Open

Ben Kang, Xin Chen, Jie Zhao, Chunjuan Bo, Dong Wang , et al. · 2025

Transformer-based visual trackers have demonstrated significant advancements due to their powerful modeling capabilities. However, their practicality is limited on resource-constrained devices because of their slow processing speeds. To ad…

Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking Open

Ben Kang, Xin Chen, Jie Zhao, Chunjuan Bo, Dong Wang , et al. · 2025

Computer science Chemistry Geography

Transformer-based visual trackers have demonstrated significant advancements due to their powerful modeling capabilities. However, their practicality is limited on resource-constrained devices because of their slow processing speeds. To ad…

Research progress of deubiquitinating enzymes in cerebral ischemia-reperfusion injury Open

Xiaohong Qin, Jing‐Ning Zhu, Huchuan Lu, Meihui Yi, Zheng Zhao , et al. · 2025

Medicine Chemistry Psychology

Cerebral ischemia-reperfusion injury (CIRI) is a critical pathological process driving neurological deterioration following ischemic stroke, involving multifaceted mechanisms such as inflammatory cascades, oxidative stress, and programmed …

CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting Open

Lei Tian, Xiaomin Li, Liqian Ma, Hao Yin, Zirui Zheng , et al. · 2025

Recent advances in 3D reconstruction techniques and vision-language models have fueled significant progress in 3D semantic understanding, a capability critical to robotics, autonomous driving, and virtual/augmented reality. However, method…

P3Net: Progressive and Periodic Perturbation for Semi-Supervised Medical Image Segmentation Open

Zhenyan Yao, Yongri Piao, Huchuan Lu · 2025

Perturbation with diverse unlabeled data has proven beneficial for semi-supervised medical image segmentation (SSMIS). While many works have successfully used various perturbation techniques, a deeper understanding of learning perturbation…

SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification Open

Xiangyun Hu, Pingping Zhang, Yuhao Wang, Bin Yan, Huchuan Lu · 2025

Aerial-Ground Person Re-IDentification (AG-ReID) aims to retrieve specific persons across cameras with different viewpoints. Previous works focus on designing discriminative models to maintain the identity consistency despite drastic chang…

CLIMB-ReID: A Hybrid CLIP-Mamba Framework for Person Re-Identification Open

Chenyang Yu, Xuehu Liu, Jiawen Zhu, Yuhao Wang, Pingping Zhang , et al. · 2025

Computer science Engineering Biology

Person Re-IDentification (ReID) aims to identify specific persons from non-overlapping cameras. Recently, some works have suggested using large-scale pre-trained vision-language models like CLIP to boost ReID performance. Unfortunately, ex…

Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding Open

Wenbo Zhang, Lu Zhang, Ping Hu, Liqian Ma, Yunzhi Zhuge , et al. · 2025

Computer science

Injecting semantics into 3D Gaussian Splatting (3DGS) has recently garnered significant attention. While current approaches typically distill 3D semantic features from 2D foundational models (e.g., CLIP and SAM) to facilitate novel view se…

SUTrack: Towards Simple and Unified Single Object Tracking Open

Xin Chen, Ben Kang, Wanting Geng, Jiawen Zhu, Yi Liu , et al. · 2025

Computer science Psychology Philosophy

In this paper, we propose a simple yet unified single object tracking (SOT) framework, dubbed SUTrack. It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a s…

Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking Open

Jiawen Zhu, Huayi Tang, Xin Chen, Xinying Wang, Dong Wang , et al. · 2025

Computer science Psychology

Efficient tracking has garnered attention for its ability to operate on resource-constrained platforms for real-world deployment beyond desktop GPUs. Current efficient trackers mainly follow precision-oriented trackers, adopting a one-stre…

MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt Open

Yuhao Wang, Xuehu Liu, Tianyu Yan, Yang Liu, Aihua Zheng , et al. · 2025

Computer science Chemistry Biology

Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary image information from different modalities. Recently, large-scale pre-trained models like CLIP have demonstrated impressive performan…

DefMamba: Deformable Visual State Space Model Open

Leiye Liu, Miao Zhang, Yongri Piao, Huchuan Lu · 2025

Recently, state space models (SSM), particularly Mamba, have attracted significant attention from scholars due to their ability to effectively balance computational efficiency and performance. However, most existing visual Mamba methods fl…

IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification Open

Yuhao Wang, Yongfeng Lv, Pingping Zhang, Huchuan Lu · 2025

Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary information from various modalities. However, existing methods focus on fusing heterogeneous visual features, neglecting the potential…

Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking Open

Jiawen Zhu, Huayi Tang, Xin Chen, Xinying Wang, Dong Wang , et al. · 2025

Efficient tracking has garnered attention for its ability to operate on resource-constrained platforms for real-world deployment beyond desktop GPUs. Current efficient trackers mainly follow precision-oriented trackers, adopting a one-stre…

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models Open

Haiwen Diao, Xiaotong Li, Yufeng Cui, Yueze Wang, Haoge Deng , et al. · 2025

Computer science

Existing encoder-free vision-language models (VLMs) are rapidly narrowing the performance gap with their encoder-based counterparts, highlighting the promising potential for unified multimodal systems with structural simplicity and efficie…

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge Open

Hongyan Xiong, Zongxin Yang, Jiazuo Yu, Yunzhi Zhuge, Pengfei Zhang , et al. · 2025

Computer science

Recent advances in Large Language Models (LLMs) have enabled the development of Video-LLMs, advancing multimodal learning by bridging video data with language tasks. However, current video understanding models struggle with processing long…

AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation Open

Shilei Gong, Yunzhi Zhuge, Pengfei Zhang, Yifan Wang, Pingping Zhang , et al. · 2025

Computer science Chemistry

The essence of audio-visual segmentation (AVS) lies in locating and delineating sound-emitting objects within a video stream. While Transformer-based methods have shown promise, their handling of long-range dependencies struggles due to qu…

Huchuan Lu YOU? Author Swipe