arXiv (Cornell University)
ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval
May 2025 • Eric P. Xing, Pranavi Kolouju, Robert Pless, Abby Stylianou, Nathan Jacobs
Composed image retrieval (CIR) is the task of retrieving a target image specified by a query image and a relative text that describes a semantic modification to the query image. Existing methods in CIR struggle to accurately represent the image and the text modification, resulting in subpar performance. To address this limitation, we introduce a CIR framework, ConText-CIR, trained with a Text Concept-Consistency loss that encourages the representations of noun phrases in the text modification to better attend to t…