Text to Image for Multi-Label Image Recognition With Joint Prompt-Adapter Learning

Exploring foci of: IEEE Transactions on Pattern Analysis and Machine Intelligence • Vol 47 • No 9 Text to Image for Multi-Label Image Recognition With Joint Prompt-Adapter Learning May 2025 • Chun-Mei Feng, Kai Yu, Xinxing Xu, Salman Khan, Rick Siow Mong Goh, Wangmeng Zuo, Yong Liu Benefited from image-text contrastive learning, pre-trained vision-language models, e.g., CLIP, allow to direct leverage texts as images (TaI) for parameter-efficient fine-tuning (PEFT). While CLIP is capable of making image features to be similar to the corresponding text features, the modality gap remains a nontrivial issue and limits image recognition performance of TaI. Using multi-label image recognition (MLR) as an example, we present a novel method, called T2I-PAL to tackle the modality gap issue when using… Open Article Page

Artificial Intelligence Computer Science Computer Vision Engineering Architectural Engineering Open Article