ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2505.20764
Composed image retrieval (CIR) is the task of retrieving a target image specified by a query image and a relative text that describes a semantic modification to the query image. Existing methods in CIR struggle to accurately represent the image and the text modification, resulting in subpar performance. To address this limitation, we introduce a CIR framework, ConText-CIR, trained with a Text Concept-Consistency loss that encourages the representations of noun phrases in the text modification to better attend to the relevant parts of the query image. To support training with this loss function, we also propose a synthetic data generation pipeline that creates training data from existing CIR datasets or unlabeled images. We show that these components together enable stronger performance on CIR tasks, setting a new state-of-the-art in composed image retrieval in both the supervised and zero-shot settings on multiple benchmark datasets, including CIRR and CIRCO. Source code, model checkpoints, and our new datasets are available at https://github.com/mvrl/ConText-CIR.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2505.20764
- https://arxiv.org/pdf/2505.20764
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415036263
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415036263Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2505.20764Digital Object Identifier
- Title
-
ConText-CIR: Learning from Concepts in Text for Composed Image RetrievalWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-05-27Full publication date if available
- Authors
-
Eric P. Xing, Pranavi Kolouju, Robert Pless, Abby Stylianou, Nathan JacobsList of authors in order
- Landing page
-
https://arxiv.org/abs/2505.20764Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2505.20764Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2505.20764Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415036263 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2505.20764 |
| ids.doi | https://doi.org/10.48550/arxiv.2505.20764 |
| ids.openalex | https://openalex.org/W4415036263 |
| fwci | |
| type | preprint |
| title | ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10824 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9713000059127808 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Image Retrieval and Classification Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2505.20764 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by-nc-sa |
| locations[0].pdf_url | https://arxiv.org/pdf/2505.20764 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by-nc-sa |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2505.20764 |
| locations[1].id | doi:10.48550/arxiv.2505.20764 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2505.20764 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5009547049 |
| authorships[0].author.orcid | https://orcid.org/0009-0005-9158-4201 |
| authorships[0].author.display_name | Eric P. Xing |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Xing, Eric |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5119282075 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Pranavi Kolouju |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Kolouju, Pranavi |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5051490260 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-5775-8216 |
| authorships[2].author.display_name | Robert Pless |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Pless, Robert |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5069295854 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-4387-028X |
| authorships[3].author.display_name | Abby Stylianou |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Stylianou, Abby |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5029557305 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-4242-8967 |
| authorships[4].author.display_name | Nathan Jacobs |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Jacobs, Nathan |
| authorships[4].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2505.20764 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10824 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9713000059127808 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Image Retrieval and Classification Techniques |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2505.20764 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by-nc-sa |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2505.20764 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by-nc-sa |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2505.20764 |
| primary_location.id | pmh:oai:arXiv.org:2505.20764 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by-nc-sa |
| primary_location.pdf_url | https://arxiv.org/pdf/2505.20764 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by-nc-sa |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2505.20764 |
| publication_date | 2025-05-27 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 9, 14, 18, 23, 54, 60, 96, 125 |
| abstract_inverted_index.To | 48, 86 |
| abstract_inverted_index.We | 112 |
| abstract_inverted_index.at | 157 |
| abstract_inverted_index.by | 13 |
| abstract_inverted_index.in | 32, 45, 71, 128, 132 |
| abstract_inverted_index.is | 4 |
| abstract_inverted_index.of | 7, 68, 82 |
| abstract_inverted_index.on | 121, 139 |
| abstract_inverted_index.or | 109 |
| abstract_inverted_index.to | 26, 35, 75, 78 |
| abstract_inverted_index.we | 52, 93 |
| abstract_inverted_index.CIR | 33, 55, 107, 122 |
| abstract_inverted_index.and | 17, 40, 136, 145, 151 |
| abstract_inverted_index.are | 155 |
| abstract_inverted_index.new | 126, 153 |
| abstract_inverted_index.our | 152 |
| abstract_inverted_index.the | 5, 27, 38, 41, 66, 72, 79, 83, 134 |
| abstract_inverted_index.CIRR | 144 |
| abstract_inverted_index.Text | 61 |
| abstract_inverted_index.also | 94 |
| abstract_inverted_index.both | 133 |
| abstract_inverted_index.data | 98, 104 |
| abstract_inverted_index.from | 105 |
| abstract_inverted_index.loss | 63, 91 |
| abstract_inverted_index.noun | 69 |
| abstract_inverted_index.show | 113 |
| abstract_inverted_index.task | 6 |
| abstract_inverted_index.text | 20, 42, 73 |
| abstract_inverted_index.that | 21, 64, 101, 114 |
| abstract_inverted_index.this | 50, 90 |
| abstract_inverted_index.with | 59, 89 |
| abstract_inverted_index.(CIR) | 3 |
| abstract_inverted_index.code, | 148 |
| abstract_inverted_index.image | 1, 11, 16, 39, 130 |
| abstract_inverted_index.model | 149 |
| abstract_inverted_index.parts | 81 |
| abstract_inverted_index.query | 15, 28, 84 |
| abstract_inverted_index.these | 115 |
| abstract_inverted_index.CIRCO. | 146 |
| abstract_inverted_index.Source | 147 |
| abstract_inverted_index.attend | 77 |
| abstract_inverted_index.better | 76 |
| abstract_inverted_index.enable | 118 |
| abstract_inverted_index.image. | 29, 85 |
| abstract_inverted_index.subpar | 46 |
| abstract_inverted_index.target | 10 |
| abstract_inverted_index.tasks, | 123 |
| abstract_inverted_index.address | 49 |
| abstract_inverted_index.creates | 102 |
| abstract_inverted_index.images. | 111 |
| abstract_inverted_index.methods | 31 |
| abstract_inverted_index.phrases | 70 |
| abstract_inverted_index.propose | 95 |
| abstract_inverted_index.setting | 124 |
| abstract_inverted_index.support | 87 |
| abstract_inverted_index.trained | 58 |
| abstract_inverted_index.Composed | 0 |
| abstract_inverted_index.Existing | 30 |
| abstract_inverted_index.composed | 129 |
| abstract_inverted_index.datasets | 108, 154 |
| abstract_inverted_index.existing | 106 |
| abstract_inverted_index.multiple | 140 |
| abstract_inverted_index.pipeline | 100 |
| abstract_inverted_index.relative | 19 |
| abstract_inverted_index.relevant | 80 |
| abstract_inverted_index.semantic | 24 |
| abstract_inverted_index.settings | 138 |
| abstract_inverted_index.stronger | 119 |
| abstract_inverted_index.struggle | 34 |
| abstract_inverted_index.together | 117 |
| abstract_inverted_index.training | 88, 103 |
| abstract_inverted_index.available | 156 |
| abstract_inverted_index.benchmark | 141 |
| abstract_inverted_index.datasets, | 142 |
| abstract_inverted_index.describes | 22 |
| abstract_inverted_index.function, | 92 |
| abstract_inverted_index.including | 143 |
| abstract_inverted_index.introduce | 53 |
| abstract_inverted_index.represent | 37 |
| abstract_inverted_index.resulting | 44 |
| abstract_inverted_index.retrieval | 2, 131 |
| abstract_inverted_index.specified | 12 |
| abstract_inverted_index.synthetic | 97 |
| abstract_inverted_index.unlabeled | 110 |
| abstract_inverted_index.zero-shot | 137 |
| abstract_inverted_index.accurately | 36 |
| abstract_inverted_index.components | 116 |
| abstract_inverted_index.encourages | 65 |
| abstract_inverted_index.framework, | 56 |
| abstract_inverted_index.generation | 99 |
| abstract_inverted_index.retrieving | 8 |
| abstract_inverted_index.supervised | 135 |
| abstract_inverted_index.limitation, | 51 |
| abstract_inverted_index.performance | 120 |
| abstract_inverted_index.ConText-CIR, | 57 |
| abstract_inverted_index.checkpoints, | 150 |
| abstract_inverted_index.modification | 25, 74 |
| abstract_inverted_index.performance. | 47 |
| abstract_inverted_index.modification, | 43 |
| abstract_inverted_index.representations | 67 |
| abstract_inverted_index.state-of-the-art | 127 |
| abstract_inverted_index.Concept-Consistency | 62 |
| abstract_inverted_index.https://github.com/mvrl/ConText-CIR. | 158 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |