SimLabel: Consistency-Guided OOD Detection with Pretrained Vision-Language Models Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2501.11485
Detecting out-of-distribution (OOD) data is crucial in real-world machine learning applications, particularly in safety-critical domains. Existing methods often leverage language information from vision-language models (VLMs) to enhance OOD detection by improving confidence estimation through rich class-wise text information. However, when building OOD detection score upon on in-distribution (ID) text-image affinity, existing works either focus on each ID class or whole ID label sets, overlooking inherent ID classes' connection. We find that the semantic information across different ID classes is beneficial for effective OOD detection. We thus investigate the ability of image-text comprehension among different semantic-related ID labels in VLMs and propose a novel post-hoc strategy called SimLabel. SimLabel enhances the separability between ID and OOD samples by establishing a more robust image-class similarity metric that considers consistency over a set of similar class labels. Extensive experiments demonstrate the superior performance of SimLabel on various zero-shot OOD detection benchmarks. The proposed model is also extended to various VLM-backbones, demonstrating its good generalization ability. Our demonstration and implementation codes are available at: https://github.com/ShuZou-1/SimLabel.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2501.11485
- https://arxiv.org/pdf/2501.11485
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4406744325
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4406744325Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2501.11485Digital Object Identifier
- Title
-
SimLabel: Consistency-Guided OOD Detection with Pretrained Vision-Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-01-20Full publication date if available
- Authors
-
Shu Zou, Xinyu Tian, Qinyu Zhao, Zhaoyuan Yang, Jing ZhangList of authors in order
- Landing page
-
https://arxiv.org/abs/2501.11485Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2501.11485Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2501.11485Direct OA link when available
- Concepts
-
Consistency (knowledge bases), Computer science, Artificial intelligence, Natural language processing, Language modelTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4406744325 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2501.11485 |
| ids.doi | https://doi.org/10.48550/arxiv.2501.11485 |
| ids.openalex | https://openalex.org/W4406744325 |
| fwci | |
| type | preprint |
| title | SimLabel: Consistency-Guided OOD Detection with Pretrained Vision-Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12016 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9807999730110168 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1710 |
| topics[0].subfield.display_name | Information Systems |
| topics[0].display_name | Web Data Mining and Analysis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2776436953 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7179067730903625 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q5163215 |
| concepts[0].display_name | Consistency (knowledge bases) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.555220901966095 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.49289751052856445 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C204321447 |
| concepts[3].level | 1 |
| concepts[3].score | 0.43384116888046265 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[3].display_name | Natural language processing |
| concepts[4].id | https://openalex.org/C137293760 |
| concepts[4].level | 2 |
| concepts[4].score | 0.41355347633361816 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[4].display_name | Language model |
| keywords[0].id | https://openalex.org/keywords/consistency |
| keywords[0].score | 0.7179067730903625 |
| keywords[0].display_name | Consistency (knowledge bases) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.555220901966095 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.49289751052856445 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/natural-language-processing |
| keywords[3].score | 0.43384116888046265 |
| keywords[3].display_name | Natural language processing |
| keywords[4].id | https://openalex.org/keywords/language-model |
| keywords[4].score | 0.41355347633361816 |
| keywords[4].display_name | Language model |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2501.11485 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2501.11485 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2501.11485 |
| locations[1].id | doi:10.48550/arxiv.2501.11485 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2501.11485 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101292475 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Shu Zou |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Zou, Shu |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5029944382 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-1247-6076 |
| authorships[1].author.display_name | Xinyu Tian |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Tian, Xinyu |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5101153090 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-0245-1676 |
| authorships[2].author.display_name | Qinyu Zhao |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zhao, Qinyu |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5103063628 |
| authorships[3].author.orcid | https://orcid.org/0009-0007-0294-4741 |
| authorships[3].author.display_name | Zhaoyuan Yang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Yang, Zhaoyuan |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5101681014 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-2019-225X |
| authorships[4].author.display_name | Jing Zhang |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Zhang, Jing |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2501.11485 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | SimLabel: Consistency-Guided OOD Detection with Pretrained Vision-Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12016 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9807999730110168 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1710 |
| primary_topic.subfield.display_name | Information Systems |
| primary_topic.display_name | Web Data Mining and Analysis |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W3204019825 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2501.11485 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2501.11485 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2501.11485 |
| primary_location.id | pmh:oai:arXiv.org:2501.11485 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2501.11485 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2501.11485 |
| publication_date | 2025-01-20 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 101, 118, 128 |
| abstract_inverted_index.ID | 56, 60, 65, 76, 95, 112 |
| abstract_inverted_index.We | 68, 84 |
| abstract_inverted_index.by | 29, 116 |
| abstract_inverted_index.in | 6, 12, 97 |
| abstract_inverted_index.is | 4, 78, 151 |
| abstract_inverted_index.of | 89, 130, 140 |
| abstract_inverted_index.on | 45, 54, 142 |
| abstract_inverted_index.or | 58 |
| abstract_inverted_index.to | 25, 154 |
| abstract_inverted_index.OOD | 27, 41, 82, 114, 145 |
| abstract_inverted_index.Our | 162 |
| abstract_inverted_index.The | 148 |
| abstract_inverted_index.and | 99, 113, 164 |
| abstract_inverted_index.are | 167 |
| abstract_inverted_index.at: | 169 |
| abstract_inverted_index.for | 80 |
| abstract_inverted_index.its | 158 |
| abstract_inverted_index.set | 129 |
| abstract_inverted_index.the | 71, 87, 109, 137 |
| abstract_inverted_index.(ID) | 47 |
| abstract_inverted_index.VLMs | 98 |
| abstract_inverted_index.also | 152 |
| abstract_inverted_index.data | 3 |
| abstract_inverted_index.each | 55 |
| abstract_inverted_index.find | 69 |
| abstract_inverted_index.from | 21 |
| abstract_inverted_index.good | 159 |
| abstract_inverted_index.more | 119 |
| abstract_inverted_index.over | 127 |
| abstract_inverted_index.rich | 34 |
| abstract_inverted_index.text | 36 |
| abstract_inverted_index.that | 70, 124 |
| abstract_inverted_index.thus | 85 |
| abstract_inverted_index.upon | 44 |
| abstract_inverted_index.when | 39 |
| abstract_inverted_index.(OOD) | 2 |
| abstract_inverted_index.among | 92 |
| abstract_inverted_index.class | 57, 132 |
| abstract_inverted_index.codes | 166 |
| abstract_inverted_index.focus | 53 |
| abstract_inverted_index.label | 61 |
| abstract_inverted_index.model | 150 |
| abstract_inverted_index.novel | 102 |
| abstract_inverted_index.often | 17 |
| abstract_inverted_index.score | 43 |
| abstract_inverted_index.sets, | 62 |
| abstract_inverted_index.whole | 59 |
| abstract_inverted_index.works | 51 |
| abstract_inverted_index.(VLMs) | 24 |
| abstract_inverted_index.across | 74 |
| abstract_inverted_index.called | 105 |
| abstract_inverted_index.either | 52 |
| abstract_inverted_index.labels | 96 |
| abstract_inverted_index.metric | 123 |
| abstract_inverted_index.models | 23 |
| abstract_inverted_index.robust | 120 |
| abstract_inverted_index.ability | 88 |
| abstract_inverted_index.between | 111 |
| abstract_inverted_index.classes | 77 |
| abstract_inverted_index.crucial | 5 |
| abstract_inverted_index.enhance | 26 |
| abstract_inverted_index.labels. | 133 |
| abstract_inverted_index.machine | 8 |
| abstract_inverted_index.methods | 16 |
| abstract_inverted_index.propose | 100 |
| abstract_inverted_index.samples | 115 |
| abstract_inverted_index.similar | 131 |
| abstract_inverted_index.through | 33 |
| abstract_inverted_index.various | 143, 155 |
| abstract_inverted_index.Existing | 15 |
| abstract_inverted_index.However, | 38 |
| abstract_inverted_index.SimLabel | 107, 141 |
| abstract_inverted_index.ability. | 161 |
| abstract_inverted_index.building | 40 |
| abstract_inverted_index.classes' | 66 |
| abstract_inverted_index.domains. | 14 |
| abstract_inverted_index.enhances | 108 |
| abstract_inverted_index.existing | 50 |
| abstract_inverted_index.extended | 153 |
| abstract_inverted_index.inherent | 64 |
| abstract_inverted_index.language | 19 |
| abstract_inverted_index.learning | 9 |
| abstract_inverted_index.leverage | 18 |
| abstract_inverted_index.post-hoc | 103 |
| abstract_inverted_index.proposed | 149 |
| abstract_inverted_index.semantic | 72 |
| abstract_inverted_index.strategy | 104 |
| abstract_inverted_index.superior | 138 |
| abstract_inverted_index.Detecting | 0 |
| abstract_inverted_index.Extensive | 134 |
| abstract_inverted_index.SimLabel. | 106 |
| abstract_inverted_index.affinity, | 49 |
| abstract_inverted_index.available | 168 |
| abstract_inverted_index.considers | 125 |
| abstract_inverted_index.detection | 28, 42, 146 |
| abstract_inverted_index.different | 75, 93 |
| abstract_inverted_index.effective | 81 |
| abstract_inverted_index.improving | 30 |
| abstract_inverted_index.zero-shot | 144 |
| abstract_inverted_index.beneficial | 79 |
| abstract_inverted_index.class-wise | 35 |
| abstract_inverted_index.confidence | 31 |
| abstract_inverted_index.detection. | 83 |
| abstract_inverted_index.estimation | 32 |
| abstract_inverted_index.image-text | 90 |
| abstract_inverted_index.real-world | 7 |
| abstract_inverted_index.similarity | 122 |
| abstract_inverted_index.text-image | 48 |
| abstract_inverted_index.benchmarks. | 147 |
| abstract_inverted_index.connection. | 67 |
| abstract_inverted_index.consistency | 126 |
| abstract_inverted_index.demonstrate | 136 |
| abstract_inverted_index.experiments | 135 |
| abstract_inverted_index.image-class | 121 |
| abstract_inverted_index.information | 20, 73 |
| abstract_inverted_index.investigate | 86 |
| abstract_inverted_index.overlooking | 63 |
| abstract_inverted_index.performance | 139 |
| abstract_inverted_index.establishing | 117 |
| abstract_inverted_index.information. | 37 |
| abstract_inverted_index.particularly | 11 |
| abstract_inverted_index.separability | 110 |
| abstract_inverted_index.applications, | 10 |
| abstract_inverted_index.comprehension | 91 |
| abstract_inverted_index.demonstrating | 157 |
| abstract_inverted_index.demonstration | 163 |
| abstract_inverted_index.VLM-backbones, | 156 |
| abstract_inverted_index.generalization | 160 |
| abstract_inverted_index.implementation | 165 |
| abstract_inverted_index.in-distribution | 46 |
| abstract_inverted_index.safety-critical | 13 |
| abstract_inverted_index.vision-language | 22 |
| abstract_inverted_index.semantic-related | 94 |
| abstract_inverted_index.out-of-distribution | 1 |
| abstract_inverted_index.https://github.com/ShuZou-1/SimLabel. | 170 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |