I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2209.10304
Despite the tremendous progress in zero-shot learning(ZSL), the majority of existing methods still rely on human-annotated attributes, which are difficult to annotate and scale. An unsupervised alternative is to represent each class using the word embedding associated with its semantic class name. However, word embeddings extracted from pre-trained language models do not necessarily capture visual similarities, resulting in poor zero-shot performance. In this work, we argue that online textual documents, e.g., Wikipedia, contain rich visual descriptions about object classes, therefore can be used as powerful unsupervised side information for ZSL. To this end, we propose I2DFormer, a novel transformer-based ZSL framework that jointly learns to encode images and documents by aligning both modalities in a shared embedding space. In order to distill discriminative visual words from noisy documents, we introduce a new cross-modal attention module that learns fine-grained interactions between image patches and document words. Consequently, our I2DFormer not only learns highly discriminative document embeddings that capture visual similarities but also gains the ability to localize visually relevant words in image regions. Quantitatively, we demonstrate that our I2DFormer significantly outperforms previous unsupervised semantic embeddings under both zero-shot and generalized zero-shot learning settings on three public datasets. Qualitatively, we show that our method leads to highly interpretable results where document words can be grounded in the image regions.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2209.10304
- https://arxiv.org/pdf/2209.10304
- OA Status
- green
- Cited By
- 21
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4297899821
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4297899821Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2209.10304Digital Object Identifier
- Title
-
I2DFormer: Learning Image to Document Attention for Zero-Shot Image ClassificationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-09-21Full publication date if available
- Authors
-
Muhammad Ferjad Naeem, Yongqin Xian, Luc Van Gool, Federico TombariList of authors in order
- Landing page
-
https://arxiv.org/abs/2209.10304Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2209.10304Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2209.10304Direct OA link when available
- Concepts
-
Computer science, Discriminative model, Embedding, Artificial intelligence, Image (mathematics), Class (philosophy), Pattern recognition (psychology), Natural language processing, Word (group theory), Contextual image classification, Machine learning, Mathematics, GeometryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
21Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 6, 2024: 11, 2023: 4Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4297899821 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2209.10304 |
| ids.doi | https://doi.org/10.48550/arxiv.2209.10304 |
| ids.openalex | https://openalex.org/W4297899821 |
| fwci | |
| type | preprint |
| title | I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11307 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9951000213623047 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Domain Adaptation and Few-Shot Learning |
| topics[1].id | https://openalex.org/T11714 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9945999979972839 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Multimodal Machine Learning Applications |
| topics[2].id | https://openalex.org/T11775 |
| topics[2].field.id | https://openalex.org/fields/27 |
| topics[2].field.display_name | Medicine |
| topics[2].score | 0.9782000184059143 |
| topics[2].domain.id | https://openalex.org/domains/4 |
| topics[2].domain.display_name | Health Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2741 |
| topics[2].subfield.display_name | Radiology, Nuclear Medicine and Imaging |
| topics[2].display_name | COVID-19 diagnosis using AI |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7771740555763245 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C97931131 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7698386311531067 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q5282087 |
| concepts[1].display_name | Discriminative model |
| concepts[2].id | https://openalex.org/C41608201 |
| concepts[2].level | 2 |
| concepts[2].score | 0.7020485401153564 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q980509 |
| concepts[2].display_name | Embedding |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.6568190455436707 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C115961682 |
| concepts[4].level | 2 |
| concepts[4].score | 0.4903245270252228 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[4].display_name | Image (mathematics) |
| concepts[5].id | https://openalex.org/C2777212361 |
| concepts[5].level | 2 |
| concepts[5].score | 0.48544055223464966 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q5127848 |
| concepts[5].display_name | Class (philosophy) |
| concepts[6].id | https://openalex.org/C153180895 |
| concepts[6].level | 2 |
| concepts[6].score | 0.46055567264556885 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[6].display_name | Pattern recognition (psychology) |
| concepts[7].id | https://openalex.org/C204321447 |
| concepts[7].level | 1 |
| concepts[7].score | 0.45940107107162476 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[7].display_name | Natural language processing |
| concepts[8].id | https://openalex.org/C90805587 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4484359622001648 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q10944557 |
| concepts[8].display_name | Word (group theory) |
| concepts[9].id | https://openalex.org/C75294576 |
| concepts[9].level | 3 |
| concepts[9].score | 0.41496947407722473 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q5165192 |
| concepts[9].display_name | Contextual image classification |
| concepts[10].id | https://openalex.org/C119857082 |
| concepts[10].level | 1 |
| concepts[10].score | 0.3617013692855835 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[10].display_name | Machine learning |
| concepts[11].id | https://openalex.org/C33923547 |
| concepts[11].level | 0 |
| concepts[11].score | 0.10658711194992065 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[11].display_name | Mathematics |
| concepts[12].id | https://openalex.org/C2524010 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q8087 |
| concepts[12].display_name | Geometry |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7771740555763245 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/discriminative-model |
| keywords[1].score | 0.7698386311531067 |
| keywords[1].display_name | Discriminative model |
| keywords[2].id | https://openalex.org/keywords/embedding |
| keywords[2].score | 0.7020485401153564 |
| keywords[2].display_name | Embedding |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.6568190455436707 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/image |
| keywords[4].score | 0.4903245270252228 |
| keywords[4].display_name | Image (mathematics) |
| keywords[5].id | https://openalex.org/keywords/class |
| keywords[5].score | 0.48544055223464966 |
| keywords[5].display_name | Class (philosophy) |
| keywords[6].id | https://openalex.org/keywords/pattern-recognition |
| keywords[6].score | 0.46055567264556885 |
| keywords[6].display_name | Pattern recognition (psychology) |
| keywords[7].id | https://openalex.org/keywords/natural-language-processing |
| keywords[7].score | 0.45940107107162476 |
| keywords[7].display_name | Natural language processing |
| keywords[8].id | https://openalex.org/keywords/word |
| keywords[8].score | 0.4484359622001648 |
| keywords[8].display_name | Word (group theory) |
| keywords[9].id | https://openalex.org/keywords/contextual-image-classification |
| keywords[9].score | 0.41496947407722473 |
| keywords[9].display_name | Contextual image classification |
| keywords[10].id | https://openalex.org/keywords/machine-learning |
| keywords[10].score | 0.3617013692855835 |
| keywords[10].display_name | Machine learning |
| keywords[11].id | https://openalex.org/keywords/mathematics |
| keywords[11].score | 0.10658711194992065 |
| keywords[11].display_name | Mathematics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2209.10304 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2209.10304 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2209.10304 |
| locations[1].id | doi:10.48550/arxiv.2209.10304 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2209.10304 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5103091877 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-7455-7280 |
| authorships[0].author.display_name | Muhammad Ferjad Naeem |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Naeem, Muhammad Ferjad |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5012209802 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-7186-1295 |
| authorships[1].author.display_name | Yongqin Xian |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Xian, Yongqin |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5001254143 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-3445-5711 |
| authorships[2].author.display_name | Luc Van Gool |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Van Gool, Luc |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5041092666 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-5598-5212 |
| authorships[3].author.display_name | Federico Tombari |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Tombari, Federico |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2209.10304 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11307 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9951000213623047 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Domain Adaptation and Few-Shot Learning |
| related_works | https://openalex.org/W4389116644, https://openalex.org/W2153315159, https://openalex.org/W3103844505, https://openalex.org/W259157601, https://openalex.org/W4205463238, https://openalex.org/W2312145515, https://openalex.org/W2761785940, https://openalex.org/W4252364083, https://openalex.org/W4210657415, https://openalex.org/W2129933262 |
| cited_by_count | 21 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 6 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 11 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 4 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2209.10304 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2209.10304 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2209.10304 |
| primary_location.id | pmh:oai:arXiv.org:2209.10304 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2209.10304 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2209.10304 |
| publication_date | 2022-09-21 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 96, 114, 130 |
| abstract_inverted_index.An | 24 |
| abstract_inverted_index.In | 61, 118 |
| abstract_inverted_index.To | 90 |
| abstract_inverted_index.as | 83 |
| abstract_inverted_index.be | 81, 211 |
| abstract_inverted_index.by | 109 |
| abstract_inverted_index.do | 50 |
| abstract_inverted_index.in | 4, 57, 113, 169, 213 |
| abstract_inverted_index.is | 27 |
| abstract_inverted_index.of | 9 |
| abstract_inverted_index.on | 14, 192 |
| abstract_inverted_index.to | 20, 28, 104, 120, 164, 203 |
| abstract_inverted_index.we | 64, 93, 128, 173, 197 |
| abstract_inverted_index.ZSL | 99 |
| abstract_inverted_index.and | 22, 107, 142, 187 |
| abstract_inverted_index.are | 18 |
| abstract_inverted_index.but | 159 |
| abstract_inverted_index.can | 80, 210 |
| abstract_inverted_index.for | 88 |
| abstract_inverted_index.its | 38 |
| abstract_inverted_index.new | 131 |
| abstract_inverted_index.not | 51, 148 |
| abstract_inverted_index.our | 146, 176, 200 |
| abstract_inverted_index.the | 1, 7, 33, 162, 214 |
| abstract_inverted_index.ZSL. | 89 |
| abstract_inverted_index.also | 160 |
| abstract_inverted_index.both | 111, 185 |
| abstract_inverted_index.each | 30 |
| abstract_inverted_index.end, | 92 |
| abstract_inverted_index.from | 46, 125 |
| abstract_inverted_index.only | 149 |
| abstract_inverted_index.poor | 58 |
| abstract_inverted_index.rely | 13 |
| abstract_inverted_index.rich | 73 |
| abstract_inverted_index.show | 198 |
| abstract_inverted_index.side | 86 |
| abstract_inverted_index.that | 66, 101, 135, 155, 175, 199 |
| abstract_inverted_index.this | 62, 91 |
| abstract_inverted_index.used | 82 |
| abstract_inverted_index.with | 37 |
| abstract_inverted_index.word | 34, 43 |
| abstract_inverted_index.about | 76 |
| abstract_inverted_index.argue | 65 |
| abstract_inverted_index.class | 31, 40 |
| abstract_inverted_index.e.g., | 70 |
| abstract_inverted_index.gains | 161 |
| abstract_inverted_index.image | 140, 170, 215 |
| abstract_inverted_index.leads | 202 |
| abstract_inverted_index.name. | 41 |
| abstract_inverted_index.noisy | 126 |
| abstract_inverted_index.novel | 97 |
| abstract_inverted_index.order | 119 |
| abstract_inverted_index.still | 12 |
| abstract_inverted_index.three | 193 |
| abstract_inverted_index.under | 184 |
| abstract_inverted_index.using | 32 |
| abstract_inverted_index.where | 207 |
| abstract_inverted_index.which | 17 |
| abstract_inverted_index.words | 124, 168, 209 |
| abstract_inverted_index.work, | 63 |
| abstract_inverted_index.encode | 105 |
| abstract_inverted_index.highly | 151, 204 |
| abstract_inverted_index.images | 106 |
| abstract_inverted_index.learns | 103, 136, 150 |
| abstract_inverted_index.method | 201 |
| abstract_inverted_index.models | 49 |
| abstract_inverted_index.module | 134 |
| abstract_inverted_index.object | 77 |
| abstract_inverted_index.online | 67 |
| abstract_inverted_index.public | 194 |
| abstract_inverted_index.scale. | 23 |
| abstract_inverted_index.shared | 115 |
| abstract_inverted_index.space. | 117 |
| abstract_inverted_index.visual | 54, 74, 123, 157 |
| abstract_inverted_index.words. | 144 |
| abstract_inverted_index.Despite | 0 |
| abstract_inverted_index.ability | 163 |
| abstract_inverted_index.between | 139 |
| abstract_inverted_index.capture | 53, 156 |
| abstract_inverted_index.contain | 72 |
| abstract_inverted_index.distill | 121 |
| abstract_inverted_index.jointly | 102 |
| abstract_inverted_index.methods | 11 |
| abstract_inverted_index.patches | 141 |
| abstract_inverted_index.propose | 94 |
| abstract_inverted_index.results | 206 |
| abstract_inverted_index.textual | 68 |
| abstract_inverted_index.However, | 42 |
| abstract_inverted_index.aligning | 110 |
| abstract_inverted_index.annotate | 21 |
| abstract_inverted_index.classes, | 78 |
| abstract_inverted_index.document | 143, 153, 208 |
| abstract_inverted_index.existing | 10 |
| abstract_inverted_index.grounded | 212 |
| abstract_inverted_index.language | 48 |
| abstract_inverted_index.learning | 190 |
| abstract_inverted_index.localize | 165 |
| abstract_inverted_index.majority | 8 |
| abstract_inverted_index.powerful | 84 |
| abstract_inverted_index.previous | 180 |
| abstract_inverted_index.progress | 3 |
| abstract_inverted_index.regions. | 171, 216 |
| abstract_inverted_index.relevant | 167 |
| abstract_inverted_index.semantic | 39, 182 |
| abstract_inverted_index.settings | 191 |
| abstract_inverted_index.visually | 166 |
| abstract_inverted_index.I2DFormer | 147, 177 |
| abstract_inverted_index.attention | 133 |
| abstract_inverted_index.datasets. | 195 |
| abstract_inverted_index.difficult | 19 |
| abstract_inverted_index.documents | 108 |
| abstract_inverted_index.embedding | 35, 116 |
| abstract_inverted_index.extracted | 45 |
| abstract_inverted_index.framework | 100 |
| abstract_inverted_index.introduce | 129 |
| abstract_inverted_index.represent | 29 |
| abstract_inverted_index.resulting | 56 |
| abstract_inverted_index.therefore | 79 |
| abstract_inverted_index.zero-shot | 5, 59, 186, 189 |
| abstract_inverted_index.I2DFormer, | 95 |
| abstract_inverted_index.Wikipedia, | 71 |
| abstract_inverted_index.associated | 36 |
| abstract_inverted_index.documents, | 69, 127 |
| abstract_inverted_index.embeddings | 44, 154, 183 |
| abstract_inverted_index.modalities | 112 |
| abstract_inverted_index.tremendous | 2 |
| abstract_inverted_index.alternative | 26 |
| abstract_inverted_index.attributes, | 16 |
| abstract_inverted_index.cross-modal | 132 |
| abstract_inverted_index.demonstrate | 174 |
| abstract_inverted_index.generalized | 188 |
| abstract_inverted_index.information | 87 |
| abstract_inverted_index.necessarily | 52 |
| abstract_inverted_index.outperforms | 179 |
| abstract_inverted_index.pre-trained | 47 |
| abstract_inverted_index.descriptions | 75 |
| abstract_inverted_index.fine-grained | 137 |
| abstract_inverted_index.interactions | 138 |
| abstract_inverted_index.performance. | 60 |
| abstract_inverted_index.similarities | 158 |
| abstract_inverted_index.unsupervised | 25, 85, 181 |
| abstract_inverted_index.Consequently, | 145 |
| abstract_inverted_index.interpretable | 205 |
| abstract_inverted_index.significantly | 178 |
| abstract_inverted_index.similarities, | 55 |
| abstract_inverted_index.Qualitatively, | 196 |
| abstract_inverted_index.discriminative | 122, 152 |
| abstract_inverted_index.learning(ZSL), | 6 |
| abstract_inverted_index.Quantitatively, | 172 |
| abstract_inverted_index.human-annotated | 15 |
| abstract_inverted_index.transformer-based | 98 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/10 |
| sustainable_development_goals[0].score | 0.7099999785423279 |
| sustainable_development_goals[0].display_name | Reduced inequalities |
| citation_normalized_percentile |