Hanzi Wang
YOU?
Author Swipe
View article: FATE: A Prompt-Tuning-Based Semi-Supervised Learning Framework for Extremely Limited Labeled Data
FATE: A Prompt-Tuning-Based Semi-Supervised Learning Framework for Extremely Limited Labeled Data Open
Semi-supervised learning (SSL) has achieved significant progress by leveraging both labeled data and unlabeled data. Existing SSL methods overlook a common real-world scenario when labeled data is extremely scarce, potentially as limited a…
View article: Classifying Long-tailed and Label-noise Data via Disentangling and Unlearning
Classifying Long-tailed and Label-noise Data via Disentangling and Unlearning Open
In real-world datasets, the challenges of long-tailed distributions and noisy labels often coexist, posing obstacles to the model training and performance. Existing studies on long-tailed noisy label learning (LTNLL) typically assume that …
View article: You Are Your Own Best Teacher: Achieving Centralized-level Performance in Federated Learning under Heterogeneous and Long-tailed Data
You Are Your Own Best Teacher: Achieving Centralized-level Performance in Federated Learning under Heterogeneous and Long-tailed Data Open
Data heterogeneity, stemming from local non-IID data and global long-tailed distributions, is a major challenge in federated learning (FL), leading to significant performance gaps compared to centralized learning. Previous research found t…
View article: Augmentation Matters: A Mix-Paste Method for X-Ray Prohibited Item Detection Under Noisy Annotations
Augmentation Matters: A Mix-Paste Method for X-Ray Prohibited Item Detection Under Noisy Annotations Open
Automatic X-ray prohibited item detection is vital for public safety. Existing deep learning-based methods all assume that the annotations of training X-ray images are correct. However, obtaining correct annotations is extremely hard if no…
View article: Uncertainty-Aware Label Refinement on Hypergraphs for Personalized Federated Facial Expression Recognition
Uncertainty-Aware Label Refinement on Hypergraphs for Personalized Federated Facial Expression Recognition Open
Most facial expression recognition (FER) models are trained on large-scale\nexpression data with centralized learning. Unfortunately, collecting a large\namount of centralized expression data is difficult in practice due to privacy\nconcer…
View article: Video-to-Task Learning via Motion-Guided Attention for Few-Shot Action Recognition
Video-to-Task Learning via Motion-Guided Attention for Few-Shot Action Recognition Open
In recent years, few-shot action recognition has achieved remarkable performance through spatio-temporal relation modeling. Although a wide range of spatial and temporal alignment modules have been proposed, they primarily address spatial …
View article: Transitive Vision-Language Prompt Learning for Domain Generalization
Transitive Vision-Language Prompt Learning for Domain Generalization Open
The vision-language pre-training has enabled deep models to make a huge step forward in generalizing across unseen domains. The recent learning method based on the vision-language pre-training model is a great tool for domain generalizatio…
View article: Dynamically Anchored Prompting for Task-Imbalanced Continual Learning
Dynamically Anchored Prompting for Task-Imbalanced Continual Learning Open
Existing continual learning literature relies heavily on a strong assumption that tasks arrive with a balanced data stream, which is often unrealistic in real-world applications. In this work, we explore task-imbalanced continual learning …
View article: Spatial-Contextual Discrepancy Information Compensation for GAN Inversion
Spatial-Contextual Discrepancy Information Compensation for GAN Inversion Open
Most existing GAN inversion methods either achieve accurate reconstruction but lack editability or offer strong editability at the cost of fidelity. Hence, how to balance the distortion-editability trade-off is a significant challenge for …
View article: Federated Learning with Extremely Noisy Clients via Negative Distillation
Federated Learning with Extremely Noisy Clients via Negative Distillation Open
Federated learning (FL) has shown remarkable success in cooperatively training deep models, while typically struggling with noisy labels. Advanced works propose to tackle label noise by a re-weighting strategy with a strong assumption, i.e…
View article: Frequency Domain Nuances Mining for Visible-Infrared Person Re-identification
Frequency Domain Nuances Mining for Visible-Infrared Person Re-identification Open
The key of visible-infrared person re-identification (VIReID) lies in how to minimize the modality discrepancy between visible and infrared images. Existing methods mainly exploit the spatial information while ignoring the discriminative f…
View article: Federated Learning with Extremely Noisy Clients via Negative Distillation
Federated Learning with Extremely Noisy Clients via Negative Distillation Open
Federated learning (FL) has shown remarkable success in cooperatively training deep models, while typically struggling with noisy labels. Advanced works propose to tackle label noise by a re-weighting strategy with a strong assumption, i.e…
View article: Spatial-Contextual Discrepancy Information Compensation for GAN Inversion
Spatial-Contextual Discrepancy Information Compensation for GAN Inversion Open
Most existing GAN inversion methods either achieve accurate reconstruction but lack editability or offer strong editability at the cost of fidelity. Hence, how to balance the distortioneditability trade-off is a significant challenge for G…
View article: MRCN: A Novel Modality Restitution and Compensation Network for Visible-Infrared Person Re-identification
MRCN: A Novel Modality Restitution and Compensation Network for Visible-Infrared Person Re-identification Open
Visible-infrared person re-identification (VI-ReID), which aims to search identities across different spectra, is a challenging task due to large cross-modality discrepancy between visible and infrared images. The key to reduce the discrep…
View article: Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation
Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation Open
Scene Graph Generation (SGG) aims to structurally and comprehensively represent objects and their connections in images, it can significantly benefit scene understanding and other related downstream tasks. Existing SGG models often struggl…
View article: PARFormer: Transformer-based Multi-Task Network for Pedestrian Attribute Recognition
PARFormer: Transformer-based Multi-Task Network for Pedestrian Attribute Recognition Open
Pedestrian attribute recognition (PAR) has received increasing attention because of its wide application in video surveillance and pedestrian analysis. Extracting robust feature representation is one of the key challenges in this task. The…
View article: Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation
Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation Open
Deep neural networks have made huge progress in the last few decades. However, as the real-world data often exhibits a long-tailed distribution, vanilla deep models tend to be heavily biased toward the majority classes. To address this pro…
View article: Personalized Federated Learning on Long-Tailed Data via Adversarial Feature Augmentation
Personalized Federated Learning on Long-Tailed Data via Adversarial Feature Augmentation Open
Personalized Federated Learning (PFL) aims to learn personalized models for each client based on the knowledge across all clients in a privacy-preserving manner. Existing PFL methods generally assume that the underlying global data across …
View article: MRCN: A Novel Modality Restitution and Compensation Network for Visible-Infrared Person Re-identification
MRCN: A Novel Modality Restitution and Compensation Network for Visible-Infrared Person Re-identification Open
Visible-infrared person re-identification (VI-ReID), which aims to search identities across different spectra, is a challenging task due to large cross-modality discrepancy between visible and infrared images. The key to reduce the discrep…
View article: Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification
Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification Open
For the visible-infrared person re-identification (VIReID) task, one of the major challenges is the modality gaps between visible (VIS) and infrared (IR) images. However, the training samples are usually limited, while the modality gaps ar…
View article: Federated Semi-Supervised Learning with Annotation Heterogeneity
Federated Semi-Supervised Learning with Annotation Heterogeneity Open
Federated Semi-Supervised Learning (FSSL) aims to learn a global model from different clients in an environment with both labeled and unlabeled data. Most of the existing FSSL work generally assumes that both types of data are available on…
View article: DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection
DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection Open
The prosperity of deep learning contributes to the rapid progress in scene text detection. Among all the methods with convolutional networks, segmentation-based ones have drawn extensive attention due to their superiority in detecting text…
View article: Label-Noise Learning with Intrinsically Long-Tailed Data
Label-Noise Learning with Intrinsically Long-Tailed Data Open
Label noise is one of the key factors that lead to the poor generalization of deep learning models. Existing label-noise learning methods usually assume that the ground-truth classes of the training data are balanced. However, the real-wor…
View article: Learn-to-Decompose: Cascaded Decomposition Network for Cross-Domain Few-Shot Facial Expression Recognition
Learn-to-Decompose: Cascaded Decomposition Network for Cross-Domain Few-Shot Facial Expression Recognition Open
Most existing compound facial expression recognition (FER) methods rely on large-scale labeled compound expression data for training. However, collecting such data is labor-intensive and time-consuming. In this paper, we address the compou…
View article: Federated Learning on Heterogeneous and Long-Tailed Data via Classifier Re-Training with Federated Features
Federated Learning on Heterogeneous and Long-Tailed Data via Classifier Re-Training with Federated Features Open
Federated learning (FL) provides a privacy-preserving solution for distributed machine learning tasks. One challenging problem that severely damages the performance of FL models is the co-occurrence of data heterogeneity and long-tail dist…
View article: When Facial Expression Recognition Meets Few-Shot Learning: A Joint and Alternate Learning Framework
When Facial Expression Recognition Meets Few-Shot Learning: A Joint and Alternate Learning Framework Open
Human emotions involve basic and compound facial expressions. However, current research on facial expression recognition (FER) mainly focuses on basic expressions, and thus fails to address the diversity of human emotions in practical scen…