Explanipedia

Diff-MM: Exploring Pre-trained Text-to-Image Generation Model for Unified Multi-modal Object Tracking Open

Shiyu Xuan, Zechao Li, Jinhui Tang · 2025

Multi-modal object tracking integrates auxiliary modalities such as depth, thermal infrared, event flow, and language to provide additional information beyond RGB images, showing great potential in improving tracking stabilization in compl…

Dataset Distillation for Histopathology Image Classification Open

Cong Cong, Shiyu Xuan, Sidong Liu, Maurice Pagnucco, Shiliang Zhang , et al. · 2024

Deep neural networks (DNNs) have exhibited remarkable success in the field of histopathology image analysis. On the other hand, the contemporary trend of employing large models and extensive datasets has underscored the significance of dat…

LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model Open

Dongkai Wang, Shiyu Xuan, Shiliang Zhang · 2024

The capacity of existing human keypoint localization models is limited by keypoint priors provided by the training data. To alleviate this restriction and pursue more general model, this work studies keypoint localization from a different …

Decoupled Optimisation for Long-Tailed Visual Recognition Open

Cong Cong, Shiyu Xuan, Sidong Liu, Shiliang Zhang, Maurice Pagnucco , et al. · 2024

When training on a long-tailed dataset, conventional learning algorithms tend to exhibit a bias towards classes with a larger sample size. Our investigation has revealed that this biased learning tendency originates from the model paramete…

Decoupled Contrastive Learning for Long-Tailed Recognition Open

Shiyu Xuan, Shiliang Zhang · 2024

Supervised Contrastive Loss (SCL) is popular in visual representation learning. Given an anchor image, SCL pulls two types of positive samples, i.e., its augmentation and other images from the same class together, while pushes negative ima…

Decoupled Contrastive Learning for Long-Tailed Recognition Open

Shiyu Xuan, Shiliang Zhang · 2024

Supervised Contrastive Loss (SCL) is popular in visual representation learning. Given an anchor image, SCL pulls two types of positive samples, i.e., its augmentation and other images from the same class together, while pushes negative ima…

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs Open

Shiyu Xuan, Qingpei Guo, Ming Yang, Shiliang Zhang · 2023

Multi-modal Large Language Models (MLLMs) have shown remarkable capabilities in various multi-modal tasks. Nevertheless, their performance in fine-grained image understanding tasks is still limited. To address this issue, this paper propos…

Intra-Inter Camera Similarity for Unsupervised Person Re-Identification Open

Shiyu Xuan, Shiliang Zhang · 2021

Most of unsupervised person Re-Identification (Re-ID) works produce pseudo-labels by measuring the feature similarity without considering the distribution discrepancy among cameras, leading to degraded accuracy in label computation across …

Siamese networks with distractor-reduction method for long-term visual object tracking Open

Shiyu Xuan, Shengyang Li, Zifei Zhao, Longxuan Kou, Zhuang Zhou , et al. · 2020

Many trackers which divide the tracking process into two stages have recently been proposed to solve the problem of long-term tracking. Their outstanding performance makes them become one of the mainstream algorithms of long-term tracking.…

Object Tracking in Satellite Videos by Improved Correlation Filters With Motion Estimations Open

Shiyu Xuan, Shengyang Li, Mingfei Han, Xue Wan, Gui-Song Xia · 2019

As a new method of Earth observation, video satellite is capable of monitoring specific events on the Earth's surface continuously by providing high-temporal resolution remote sensing images. The video observations enable a variety of new …

Shiyu Xuan YOU? Author Swipe