What You See is What You Read? Improving Text-Image Alignment Evaluation Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2305.10400
Automatically determining whether a text and a corresponding image are semantically aligned is a significant challenge for vision-language models, with applications in generative text-to-image and image-to-text tasks. In this work, we study methods for automatic text-image alignment evaluation. We first introduce SeeTRUE: a comprehensive evaluation set, spanning multiple datasets from both text-to-image and image-to-text generation tasks, with human judgements for whether a given text-image pair is semantically aligned. We then describe two automatic methods to determine alignment: the first involving a pipeline based on question generation and visual question answering models, and the second employing an end-to-end classification approach by finetuning multimodal pretrained models. Both methods surpass prior approaches in various text-image alignment tasks, with significant improvements in challenging cases that involve complex composition or unnatural images. Finally, we demonstrate how our approaches can localize specific misalignments between an image and a given text, and how they can be used to automatically re-rank candidates in text-to-image generation.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2305.10400
- https://arxiv.org/pdf/2305.10400
- OA Status
- green
- Cited By
- 14
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4377121434
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4377121434Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2305.10400Digital Object Identifier
- Title
-
What You See is What You Read? Improving Text-Image Alignment EvaluationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-05-17Full publication date if available
- Authors
-
Michal Yarom, Yonatan Bitton, Soravit Changpinyo, Roee Aharoni, Jonathan Herzig, Oran Lang, E. O. Ofek, Idan SzpektorList of authors in order
- Landing page
-
https://arxiv.org/abs/2305.10400Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2305.10400Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2305.10400Direct OA link when available
- Concepts
-
Computer science, Image (mathematics), Pipeline (software), Generative grammar, Set (abstract data type), Artificial intelligence, Rank (graph theory), Natural language processing, Generative model, Information retrieval, Pattern recognition (psychology), Mathematics, Combinatorics, Programming languageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
14Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 4, 2024: 8, 2023: 2Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4377121434 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2305.10400 |
| ids.doi | https://doi.org/10.48550/arxiv.2305.10400 |
| ids.openalex | https://openalex.org/W4377121434 |
| fwci | |
| type | preprint |
| title | What You See is What You Read? Improving Text-Image Alignment Evaluation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9997000098228455 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9878000020980835 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| topics[2].id | https://openalex.org/T10181 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.980400025844574 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Natural Language Processing Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8051720857620239 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C115961682 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6927599310874939 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[1].display_name | Image (mathematics) |
| concepts[2].id | https://openalex.org/C43521106 |
| concepts[2].level | 2 |
| concepts[2].score | 0.672462522983551 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q2165493 |
| concepts[2].display_name | Pipeline (software) |
| concepts[3].id | https://openalex.org/C39890363 |
| concepts[3].level | 2 |
| concepts[3].score | 0.630967378616333 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q36108 |
| concepts[3].display_name | Generative grammar |
| concepts[4].id | https://openalex.org/C177264268 |
| concepts[4].level | 2 |
| concepts[4].score | 0.6050349473953247 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[4].display_name | Set (abstract data type) |
| concepts[5].id | https://openalex.org/C154945302 |
| concepts[5].level | 1 |
| concepts[5].score | 0.5877771973609924 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[5].display_name | Artificial intelligence |
| concepts[6].id | https://openalex.org/C164226766 |
| concepts[6].level | 2 |
| concepts[6].score | 0.5805510878562927 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q7293202 |
| concepts[6].display_name | Rank (graph theory) |
| concepts[7].id | https://openalex.org/C204321447 |
| concepts[7].level | 1 |
| concepts[7].score | 0.4495115876197815 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[7].display_name | Natural language processing |
| concepts[8].id | https://openalex.org/C167966045 |
| concepts[8].level | 3 |
| concepts[8].score | 0.411314994096756 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q5532625 |
| concepts[8].display_name | Generative model |
| concepts[9].id | https://openalex.org/C23123220 |
| concepts[9].level | 1 |
| concepts[9].score | 0.4099617898464203 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[9].display_name | Information retrieval |
| concepts[10].id | https://openalex.org/C153180895 |
| concepts[10].level | 2 |
| concepts[10].score | 0.39266088604927063 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[10].display_name | Pattern recognition (psychology) |
| concepts[11].id | https://openalex.org/C33923547 |
| concepts[11].level | 0 |
| concepts[11].score | 0.06648227572441101 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[11].display_name | Mathematics |
| concepts[12].id | https://openalex.org/C114614502 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q76592 |
| concepts[12].display_name | Combinatorics |
| concepts[13].id | https://openalex.org/C199360897 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[13].display_name | Programming language |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8051720857620239 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/image |
| keywords[1].score | 0.6927599310874939 |
| keywords[1].display_name | Image (mathematics) |
| keywords[2].id | https://openalex.org/keywords/pipeline |
| keywords[2].score | 0.672462522983551 |
| keywords[2].display_name | Pipeline (software) |
| keywords[3].id | https://openalex.org/keywords/generative-grammar |
| keywords[3].score | 0.630967378616333 |
| keywords[3].display_name | Generative grammar |
| keywords[4].id | https://openalex.org/keywords/set |
| keywords[4].score | 0.6050349473953247 |
| keywords[4].display_name | Set (abstract data type) |
| keywords[5].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[5].score | 0.5877771973609924 |
| keywords[5].display_name | Artificial intelligence |
| keywords[6].id | https://openalex.org/keywords/rank |
| keywords[6].score | 0.5805510878562927 |
| keywords[6].display_name | Rank (graph theory) |
| keywords[7].id | https://openalex.org/keywords/natural-language-processing |
| keywords[7].score | 0.4495115876197815 |
| keywords[7].display_name | Natural language processing |
| keywords[8].id | https://openalex.org/keywords/generative-model |
| keywords[8].score | 0.411314994096756 |
| keywords[8].display_name | Generative model |
| keywords[9].id | https://openalex.org/keywords/information-retrieval |
| keywords[9].score | 0.4099617898464203 |
| keywords[9].display_name | Information retrieval |
| keywords[10].id | https://openalex.org/keywords/pattern-recognition |
| keywords[10].score | 0.39266088604927063 |
| keywords[10].display_name | Pattern recognition (psychology) |
| keywords[11].id | https://openalex.org/keywords/mathematics |
| keywords[11].score | 0.06648227572441101 |
| keywords[11].display_name | Mathematics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2305.10400 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2305.10400 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2305.10400 |
| locations[1].id | doi:10.48550/arxiv.2305.10400 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2305.10400 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5034783276 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-2330-0378 |
| authorships[0].author.display_name | Michal Yarom |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Yarom, Michal |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5068580969 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-1185-6838 |
| authorships[1].author.display_name | Yonatan Bitton |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Bitton, Yonatan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5075518871 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-4013-1190 |
| authorships[2].author.display_name | Soravit Changpinyo |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Changpinyo, Soravit |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5027223654 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Roee Aharoni |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Aharoni, Roee |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5071893787 |
| authorships[4].author.orcid | https://orcid.org/0009-0000-7227-6557 |
| authorships[4].author.display_name | Jonathan Herzig |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Herzig, Jonathan |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5035078692 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-7644-8459 |
| authorships[5].author.display_name | Oran Lang |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Lang, Oran |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5078910984 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-6786-8774 |
| authorships[6].author.display_name | E. O. Ofek |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Ofek, Eran |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5026091724 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | Idan Szpektor |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Szpektor, Idan |
| authorships[7].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2305.10400 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | What You See is What You Read? Improving Text-Image Alignment Evaluation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9997000098228455 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W4365211920, https://openalex.org/W3014948380, https://openalex.org/W4380551139, https://openalex.org/W4317695495, https://openalex.org/W4387506531, https://openalex.org/W4238433571, https://openalex.org/W3174044702, https://openalex.org/W2967848559, https://openalex.org/W4299831724, https://openalex.org/W4283803360 |
| cited_by_count | 14 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 4 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 8 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 2 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2305.10400 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2305.10400 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2305.10400 |
| primary_location.id | pmh:oai:arXiv.org:2305.10400 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2305.10400 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2305.10400 |
| publication_date | 2023-05-17 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 3, 6, 13, 42, 61, 80, 141 |
| abstract_inverted_index.In | 27 |
| abstract_inverted_index.We | 38, 68 |
| abstract_inverted_index.an | 95, 138 |
| abstract_inverted_index.be | 148 |
| abstract_inverted_index.by | 99 |
| abstract_inverted_index.in | 21, 109, 117, 154 |
| abstract_inverted_index.is | 12, 65 |
| abstract_inverted_index.on | 83 |
| abstract_inverted_index.or | 124 |
| abstract_inverted_index.to | 74, 150 |
| abstract_inverted_index.we | 30, 128 |
| abstract_inverted_index.and | 5, 24, 52, 86, 91, 140, 144 |
| abstract_inverted_index.are | 9 |
| abstract_inverted_index.can | 133, 147 |
| abstract_inverted_index.for | 16, 33, 59 |
| abstract_inverted_index.how | 130, 145 |
| abstract_inverted_index.our | 131 |
| abstract_inverted_index.the | 77, 92 |
| abstract_inverted_index.two | 71 |
| abstract_inverted_index.Both | 104 |
| abstract_inverted_index.both | 50 |
| abstract_inverted_index.from | 49 |
| abstract_inverted_index.pair | 64 |
| abstract_inverted_index.set, | 45 |
| abstract_inverted_index.text | 4 |
| abstract_inverted_index.that | 120 |
| abstract_inverted_index.then | 69 |
| abstract_inverted_index.they | 146 |
| abstract_inverted_index.this | 28 |
| abstract_inverted_index.used | 149 |
| abstract_inverted_index.with | 19, 56, 114 |
| abstract_inverted_index.based | 82 |
| abstract_inverted_index.cases | 119 |
| abstract_inverted_index.first | 39, 78 |
| abstract_inverted_index.given | 62, 142 |
| abstract_inverted_index.human | 57 |
| abstract_inverted_index.image | 8, 139 |
| abstract_inverted_index.prior | 107 |
| abstract_inverted_index.study | 31 |
| abstract_inverted_index.text, | 143 |
| abstract_inverted_index.work, | 29 |
| abstract_inverted_index.second | 93 |
| abstract_inverted_index.tasks, | 55, 113 |
| abstract_inverted_index.tasks. | 26 |
| abstract_inverted_index.visual | 87 |
| abstract_inverted_index.aligned | 11 |
| abstract_inverted_index.between | 137 |
| abstract_inverted_index.complex | 122 |
| abstract_inverted_index.images. | 126 |
| abstract_inverted_index.involve | 121 |
| abstract_inverted_index.methods | 32, 73, 105 |
| abstract_inverted_index.models, | 18, 90 |
| abstract_inverted_index.models. | 103 |
| abstract_inverted_index.re-rank | 152 |
| abstract_inverted_index.surpass | 106 |
| abstract_inverted_index.various | 110 |
| abstract_inverted_index.whether | 2, 60 |
| abstract_inverted_index.Finally, | 127 |
| abstract_inverted_index.SeeTRUE: | 41 |
| abstract_inverted_index.aligned. | 67 |
| abstract_inverted_index.approach | 98 |
| abstract_inverted_index.datasets | 48 |
| abstract_inverted_index.describe | 70 |
| abstract_inverted_index.localize | 134 |
| abstract_inverted_index.multiple | 47 |
| abstract_inverted_index.pipeline | 81 |
| abstract_inverted_index.question | 84, 88 |
| abstract_inverted_index.spanning | 46 |
| abstract_inverted_index.specific | 135 |
| abstract_inverted_index.alignment | 36, 112 |
| abstract_inverted_index.answering | 89 |
| abstract_inverted_index.automatic | 34, 72 |
| abstract_inverted_index.challenge | 15 |
| abstract_inverted_index.determine | 75 |
| abstract_inverted_index.employing | 94 |
| abstract_inverted_index.introduce | 40 |
| abstract_inverted_index.involving | 79 |
| abstract_inverted_index.unnatural | 125 |
| abstract_inverted_index.alignment: | 76 |
| abstract_inverted_index.approaches | 108, 132 |
| abstract_inverted_index.candidates | 153 |
| abstract_inverted_index.end-to-end | 96 |
| abstract_inverted_index.evaluation | 44 |
| abstract_inverted_index.finetuning | 100 |
| abstract_inverted_index.generation | 54, 85 |
| abstract_inverted_index.generative | 22 |
| abstract_inverted_index.judgements | 58 |
| abstract_inverted_index.multimodal | 101 |
| abstract_inverted_index.pretrained | 102 |
| abstract_inverted_index.text-image | 35, 63, 111 |
| abstract_inverted_index.challenging | 118 |
| abstract_inverted_index.composition | 123 |
| abstract_inverted_index.demonstrate | 129 |
| abstract_inverted_index.determining | 1 |
| abstract_inverted_index.evaluation. | 37 |
| abstract_inverted_index.generation. | 156 |
| abstract_inverted_index.significant | 14, 115 |
| abstract_inverted_index.applications | 20 |
| abstract_inverted_index.improvements | 116 |
| abstract_inverted_index.semantically | 10, 66 |
| abstract_inverted_index.Automatically | 0 |
| abstract_inverted_index.automatically | 151 |
| abstract_inverted_index.comprehensive | 43 |
| abstract_inverted_index.corresponding | 7 |
| abstract_inverted_index.image-to-text | 25, 53 |
| abstract_inverted_index.misalignments | 136 |
| abstract_inverted_index.text-to-image | 23, 51, 155 |
| abstract_inverted_index.classification | 97 |
| abstract_inverted_index.vision-language | 17 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 8 |
| citation_normalized_percentile |