Fumin Shen
YOU?
Author Swipe
View article: Chatting with Interactive Memory for Text-based Person Retrieval (ChinaMM 2024)
Chatting with Interactive Memory for Text-based Person Retrieval (ChinaMM 2024) Open
Text-based person retrieval aims to match a specific pedestrian image with textual descriptions. Traditional approaches have largely focused on utilizing a "single-shot" query with text description.They may not align well with real-world s…
View article: Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric
Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric Open
Deep metric learning (DML) aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval. Prior literature predominantly focuses on pair-based and proxy-based methods to…
View article: PTAN: Principal Token-aware Adjacent Network for Compositional Temporal Grounding
PTAN: Principal Token-aware Adjacent Network for Compositional Temporal Grounding Open
Compositional temporal grounding (CTG) aims to localize the most relevant segment from an untrimmed video based on a given natural language sentence, and the test samples for this task contain novel components not seen in training. However…
View article: Dual Dynamic Threshold Adjustment Strategy for Deep Metric Learning
Dual Dynamic Threshold Adjustment Strategy for Deep Metric Learning Open
Loss functions and sample mining strategies are essential components in deep metric learning algorithms. However, the existing loss function or mining strategy often necessitate the incorporation of additional hyperparameters, notably the …
View article: Dual Dynamic Threshold Adjustment Strategy
Dual Dynamic Threshold Adjustment Strategy Open
Loss functions and sample mining strategies are essential components in deep metric learning algorithms. However, the existing loss function or mining strategy often necessitates the incorporation of additional hyperparameters, notably the…
View article: Adaptive Uncertainty-Based Learning for Text-Based Person Retrieval
Adaptive Uncertainty-Based Learning for Text-Based Person Retrieval Open
Text-based person retrieval aims at retrieving a specific pedestrian image from a gallery based on textual descriptions. The primary challenge is how to overcome the inherent heterogeneous modality gap in the situation of significant intra…
View article: Hierarchical Graph Pattern Understanding for Zero-Shot VOS
Hierarchical Graph Pattern Understanding for Zero-Shot VOS Open
The optical flow guidance strategy is ideal for obtaining motion information of objects in the video. It is widely utilized in video segmentation tasks. However, existing optical flow-based methods have a significant dependency on optical …
View article: BatchNorm-based Weakly Supervised Video Anomaly Detection
BatchNorm-based Weakly Supervised Video Anomaly Detection Open
In weakly supervised video anomaly detection (WVAD), where only video-level labels indicating the presence or absence of abnormal events are available, the primary challenge arises from the inherent ambiguity in temporal annotations of abn…
View article: MSFlow: Multi-Scale Flow-based Framework for Unsupervised Anomaly Detection
MSFlow: Multi-Scale Flow-based Framework for Unsupervised Anomaly Detection Open
Unsupervised anomaly detection (UAD) attracts a lot of research interest and drives widespread applications, where only anomaly-free samples are available for training. Some UAD applications intend to further locate the anomalous regions w…
View article: AnoOnly: Semi-Supervised Anomaly Detection with the Only Loss on Anomalies
AnoOnly: Semi-Supervised Anomaly Detection with the Only Loss on Anomalies Open
Semi-supervised anomaly detection (SSAD) methods have demonstrated their effectiveness in enhancing unsupervised anomaly detection (UAD) by leveraging few-shot but instructive abnormal instances. However, the dominance of homogeneous norma…
View article: Co-attention Propagation Network for Zero-Shot Video Object Segmentation
Co-attention Propagation Network for Zero-Shot Video Object Segmentation Open
Zero-shot video object segmentation (ZS-VOS) aims to segment foreground objects in a video sequence without prior knowledge of these objects. However, existing ZS-VOS methods often struggle to distinguish between foreground and background …
View article: Attention Map Guided Transformer Pruning for Edge Device
Attention Map Guided Transformer Pruning for Edge Device Open
Due to its significant capability of modeling long-range dependencies, vision transformer (ViT) has achieved promising success in both holistic and occluded person re-identification (Re-ID) tasks. However, the inherent problems of transfor…
View article: Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation
Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation Open
Optical flow is an easily conceived and precious cue for advancing unsupervised video object segmentation (UVOS). Most of the previous methods directly extract and fuse the motion and appearance features for segmenting target objects in th…
View article: TVT: Three-Way Vision Transformer through Multi-Modal Hypersphere Learning for Zero-Shot Sketch-Based Image Retrieval
TVT: Three-Way Vision Transformer through Multi-Modal Hypersphere Learning for Zero-Shot Sketch-Based Image Retrieval Open
In this paper, we study the zero-shot sketch-based image retrieval (ZS-SBIR) task, which retrieves natural images related to sketch queries from unseen categories. In the literature, convolutional neural networks (CNNs) have become the de-…
View article: Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly Supervised Semantic Segmentation
Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly Supervised Semantic Segmentation Open
Weakly supervised semantic segmentation with only image-level labels aims to\nreduce annotation costs for the segmentation task. Existing approaches\ngenerally leverage class activation maps (CAMs) to locate the object regions\nfor pseudo …
View article: Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach Open
Learning from the web can ease the extreme dependence of deep learning on large-scale manually labeled datasets. Especially for fine-grained recognition, which targets at distinguishing subordinate categories, it will significantly reduce …
View article: PoseGTAC: Graph Transformer Encoder-Decoder with Atrous Convolution for 3D Human Pose Estimation
PoseGTAC: Graph Transformer Encoder-Decoder with Atrous Convolution for 3D Human Pose Estimation Open
Graph neural networks (GNNs) have been widely used in the 3D human pose estimation task, since the pose representation of a human body can be naturally modeled by the graph structure. Generally, most of the existing GNN-based models utiliz…
View article: Enhancing Audio-Visual Association with Self-Supervised Curriculum Learning
Enhancing Audio-Visual Association with Self-Supervised Curriculum Learning Open
The recent success of audio-visual representations learning can be largely attributed to their pervasive concurrency property, which can be used as a self-supervision signal and extract correlation information. While most recent works focu…
View article: Prototype-supervised Adversarial Network for Targeted Attack of Deep Hashing
Prototype-supervised Adversarial Network for Targeted Attack of Deep Hashing Open
Due to its powerful capability of representation learning and high-efficiency computation, deep hashing has made significant progress in large-scale image retrieval. However, deep hashing networks are vulnerable to adversarial examples, wh…
View article: Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation
Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation Open
Semantic segmentation aims to classify every pixel of an input image. Considering the difficulty of acquiring dense labels, researchers have recently been resorting to weak labels to alleviate the annotation burden of segmentation. However…
View article: Jo-SRC: A Contrastive Approach for Combating Noisy Labels
Jo-SRC: A Contrastive Approach for Combating Noisy Labels Open
Due to the memorization effect in Deep Neural Networks (DNNs), training with noisy labels usually results in inferior model performance. Existing state-of-the-art methods primarily adopt a sample selection strategy, which selects small-los…
View article: Semantically Meaningful Class Prototype Learning for One-Shot Image Semantic Segmentation
Semantically Meaningful Class Prototype Learning for One-Shot Image Semantic Segmentation Open
One-shot semantic image segmentation aims to segment the object regions for the novel class with only one annotated image. Recent works adopt the episodic training strategy to mimic the expected situation at testing time. However, these ex…
View article: Exploiting Web Images for Fine-Grained Visual Recognition by Eliminating Noisy Samples and Utilizing Hard Ones
Exploiting Web Images for Fine-Grained Visual Recognition by Eliminating Noisy Samples and Utilizing Hard Ones Open
Labeling objects at a subordinate level typically requires expert knowledge, which is not always available when using random annotators. As such, learning directly from web images for fine-grained recognition has attracted broad attention.…
View article: A Survey Of zero shot detection: Methods and applications
A Survey Of zero shot detection: Methods and applications Open
Zero shot learning (ZSL) is aim to identify objects whose label is unavailable during training. This learning paradigm makes classifier has the ability to distinguish unseen class. The traditional ZSL method only focuses on the image recog…
View article: Dual ResGCN for Balanced Scene GraphGeneration
Dual ResGCN for Balanced Scene GraphGeneration Open
Visual scene graph generation is a challenging task. Previous works have achieved great progress, but most of them do not explicitly consider the class imbalance issue in scene graph generation. Models learned without considering the class…
View article: Web-Supervised Network with Softly Update-Drop Training for Fine-Grained Visual Classification
Web-Supervised Network with Softly Update-Drop Training for Fine-Grained Visual Classification Open
Labeling objects at the subordinate level typically requires expert knowledge, which is not always available from a random annotator. Accordingly, learning directly from web images for fine-grained visual classification (FGVC) has attracte…
View article: Auto-Encoding Twin-Bottleneck Hashing
Auto-Encoding Twin-Bottleneck Hashing Open
Conventional unsupervised hashing methods usually take advantage of similarity graphs, which are either pre-computed in the high-dimensional space or obtained from random anchor points. On the one hand, existing methods uncouple the proced…
View article: Fast Large-Scale Discrete Optimization Based on Principal Coordinate Descent
Fast Large-Scale Discrete Optimization Based on Principal Coordinate Descent Open
Binary optimization, a representative subclass of discrete optimization, plays an important role in mathematical optimization and has various applications in computer vision and machine learning. Usually, binary optimization problems are N…
View article: MetaMixUp: Learning Adaptive Interpolation Policy of MixUp with Meta-Learning
MetaMixUp: Learning Adaptive Interpolation Policy of MixUp with Meta-Learning Open
MixUp is an effective data augmentation method to regularize deep neural networks via random linear interpolations between pairs of samples and their labels. It plays an important role in model regularization, semi-supervised learning and …