CLIPPO: Image-and-Language Understanding from Pixels Only Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2212.08045
Multimodal models are becoming increasingly effective, in part due to unified components, such as the Transformer architecture. However, multimodal models still often consist of many task- and modality-specific pieces and training procedures. For example, CLIP (Radford et al., 2021) trains independent text and image towers via a contrastive loss. We explore an additional unification: the use of a pure pixel-based model to perform image, text, and multimodal tasks. Our model is trained with contrastive loss alone, so we call it CLIP-Pixels Only (CLIPPO). CLIPPO uses a single encoder that processes both regular images and text rendered as images. CLIPPO performs image-based tasks such as retrieval and zero-shot image classification almost as well as CLIP-style models, with half the number of parameters and no text-specific tower or embedding. When trained jointly via image-text contrastive learning and next-sentence contrastive learning, CLIPPO can perform well on natural language understanding tasks, without any word-level loss (language modelling or masked language modelling), outperforming pixel-based prior work. Surprisingly, CLIPPO can obtain good accuracy in visual question answering, simply by rendering the question and image together. Finally, we exploit the fact that CLIPPO does not require a tokenizer to show that it can achieve strong performance on multilingual multimodal retrieval without modifications.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2212.08045
- https://arxiv.org/pdf/2212.08045
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4311730904
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4311730904Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2212.08045Digital Object Identifier
- Title
-
CLIPPO: Image-and-Language Understanding from Pixels OnlyWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-12-15Full publication date if available
- Authors
-
Michael Tschannen, Basil Mustafa, Neil HoulsbyList of authors in order
- Landing page
-
https://arxiv.org/abs/2212.08045Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2212.08045Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2212.08045Direct OA link when available
- Concepts
-
Computer science, Artificial intelligence, Encoder, Rendering (computer graphics), Natural language processing, Pixel, Language model, Sentence, Embedding, Transformer, Exploit, Natural language, Computer security, Physics, Voltage, Operating system, Quantum mechanicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2023: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4311730904 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2212.08045 |
| ids.doi | https://doi.org/10.48550/arxiv.2212.08045 |
| ids.openalex | https://openalex.org/W4311730904 |
| fwci | |
| type | preprint |
| title | CLIPPO: Image-and-Language Understanding from Pixels Only |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9921000003814697 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| topics[2].id | https://openalex.org/T11307 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9879999756813049 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Domain Adaptation and Few-Shot Learning |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8398897647857666 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C154945302 |
| concepts[1].level | 1 |
| concepts[1].score | 0.5946365594863892 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[1].display_name | Artificial intelligence |
| concepts[2].id | https://openalex.org/C118505674 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5546784996986389 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q42586063 |
| concepts[2].display_name | Encoder |
| concepts[3].id | https://openalex.org/C205711294 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5395658612251282 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q176953 |
| concepts[3].display_name | Rendering (computer graphics) |
| concepts[4].id | https://openalex.org/C204321447 |
| concepts[4].level | 1 |
| concepts[4].score | 0.5202787518501282 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[4].display_name | Natural language processing |
| concepts[5].id | https://openalex.org/C160633673 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5157075524330139 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q355198 |
| concepts[5].display_name | Pixel |
| concepts[6].id | https://openalex.org/C137293760 |
| concepts[6].level | 2 |
| concepts[6].score | 0.5103160738945007 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[6].display_name | Language model |
| concepts[7].id | https://openalex.org/C2777530160 |
| concepts[7].level | 2 |
| concepts[7].score | 0.502509593963623 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q41796 |
| concepts[7].display_name | Sentence |
| concepts[8].id | https://openalex.org/C41608201 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4912187457084656 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q980509 |
| concepts[8].display_name | Embedding |
| concepts[9].id | https://openalex.org/C66322947 |
| concepts[9].level | 3 |
| concepts[9].score | 0.4799533188343048 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[9].display_name | Transformer |
| concepts[10].id | https://openalex.org/C165696696 |
| concepts[10].level | 2 |
| concepts[10].score | 0.4316719174385071 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q11287 |
| concepts[10].display_name | Exploit |
| concepts[11].id | https://openalex.org/C195324797 |
| concepts[11].level | 2 |
| concepts[11].score | 0.4211128354072571 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q33742 |
| concepts[11].display_name | Natural language |
| concepts[12].id | https://openalex.org/C38652104 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q3510521 |
| concepts[12].display_name | Computer security |
| concepts[13].id | https://openalex.org/C121332964 |
| concepts[13].level | 0 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[13].display_name | Physics |
| concepts[14].id | https://openalex.org/C165801399 |
| concepts[14].level | 2 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q25428 |
| concepts[14].display_name | Voltage |
| concepts[15].id | https://openalex.org/C111919701 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[15].display_name | Operating system |
| concepts[16].id | https://openalex.org/C62520636 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[16].display_name | Quantum mechanics |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8398897647857666 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[1].score | 0.5946365594863892 |
| keywords[1].display_name | Artificial intelligence |
| keywords[2].id | https://openalex.org/keywords/encoder |
| keywords[2].score | 0.5546784996986389 |
| keywords[2].display_name | Encoder |
| keywords[3].id | https://openalex.org/keywords/rendering |
| keywords[3].score | 0.5395658612251282 |
| keywords[3].display_name | Rendering (computer graphics) |
| keywords[4].id | https://openalex.org/keywords/natural-language-processing |
| keywords[4].score | 0.5202787518501282 |
| keywords[4].display_name | Natural language processing |
| keywords[5].id | https://openalex.org/keywords/pixel |
| keywords[5].score | 0.5157075524330139 |
| keywords[5].display_name | Pixel |
| keywords[6].id | https://openalex.org/keywords/language-model |
| keywords[6].score | 0.5103160738945007 |
| keywords[6].display_name | Language model |
| keywords[7].id | https://openalex.org/keywords/sentence |
| keywords[7].score | 0.502509593963623 |
| keywords[7].display_name | Sentence |
| keywords[8].id | https://openalex.org/keywords/embedding |
| keywords[8].score | 0.4912187457084656 |
| keywords[8].display_name | Embedding |
| keywords[9].id | https://openalex.org/keywords/transformer |
| keywords[9].score | 0.4799533188343048 |
| keywords[9].display_name | Transformer |
| keywords[10].id | https://openalex.org/keywords/exploit |
| keywords[10].score | 0.4316719174385071 |
| keywords[10].display_name | Exploit |
| keywords[11].id | https://openalex.org/keywords/natural-language |
| keywords[11].score | 0.4211128354072571 |
| keywords[11].display_name | Natural language |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2212.08045 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2212.08045 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2212.08045 |
| locations[1].id | doi:10.48550/arxiv.2212.08045 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2212.08045 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5088082340 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-8773-0641 |
| authorships[0].author.display_name | Michael Tschannen |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Tschannen, Michael |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5072796087 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-7305-7890 |
| authorships[1].author.display_name | Basil Mustafa |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Mustafa, Basil |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5068878643 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Neil Houlsby |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Houlsby, Neil |
| authorships[2].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2212.08045 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | CLIPPO: Image-and-Language Understanding from Pixels Only |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W2081900870, https://openalex.org/W4390516098, https://openalex.org/W2181948922, https://openalex.org/W2384362569, https://openalex.org/W2345479200, https://openalex.org/W2183306018, https://openalex.org/W2142795561, https://openalex.org/W2849310602, https://openalex.org/W4205302943, https://openalex.org/W3006008237 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2023 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2212.08045 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2212.08045 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2212.08045 |
| primary_location.id | pmh:oai:arXiv.org:2212.08045 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2212.08045 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2212.08045 |
| publication_date | 2022-12-15 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 46, 57, 85, 189 |
| abstract_inverted_index.We | 49 |
| abstract_inverted_index.an | 51 |
| abstract_inverted_index.as | 13, 96, 103, 110, 112 |
| abstract_inverted_index.by | 172 |
| abstract_inverted_index.et | 36 |
| abstract_inverted_index.in | 6, 167 |
| abstract_inverted_index.is | 70 |
| abstract_inverted_index.it | 79, 194 |
| abstract_inverted_index.no | 122 |
| abstract_inverted_index.of | 23, 56, 119 |
| abstract_inverted_index.on | 142, 199 |
| abstract_inverted_index.or | 125, 153 |
| abstract_inverted_index.so | 76 |
| abstract_inverted_index.to | 9, 61, 191 |
| abstract_inverted_index.we | 77, 180 |
| abstract_inverted_index.For | 32 |
| abstract_inverted_index.Our | 68 |
| abstract_inverted_index.and | 26, 29, 42, 65, 93, 105, 121, 134, 176 |
| abstract_inverted_index.any | 148 |
| abstract_inverted_index.are | 2 |
| abstract_inverted_index.can | 139, 163, 195 |
| abstract_inverted_index.due | 8 |
| abstract_inverted_index.not | 187 |
| abstract_inverted_index.the | 14, 54, 117, 174, 182 |
| abstract_inverted_index.use | 55 |
| abstract_inverted_index.via | 45, 130 |
| abstract_inverted_index.CLIP | 34 |
| abstract_inverted_index.Only | 81 |
| abstract_inverted_index.When | 127 |
| abstract_inverted_index.al., | 37 |
| abstract_inverted_index.both | 90 |
| abstract_inverted_index.call | 78 |
| abstract_inverted_index.does | 186 |
| abstract_inverted_index.fact | 183 |
| abstract_inverted_index.good | 165 |
| abstract_inverted_index.half | 116 |
| abstract_inverted_index.loss | 74, 150 |
| abstract_inverted_index.many | 24 |
| abstract_inverted_index.part | 7 |
| abstract_inverted_index.pure | 58 |
| abstract_inverted_index.show | 192 |
| abstract_inverted_index.such | 12, 102 |
| abstract_inverted_index.text | 41, 94 |
| abstract_inverted_index.that | 88, 184, 193 |
| abstract_inverted_index.uses | 84 |
| abstract_inverted_index.well | 111, 141 |
| abstract_inverted_index.with | 72, 115 |
| abstract_inverted_index.2021) | 38 |
| abstract_inverted_index.image | 43, 107, 177 |
| abstract_inverted_index.loss. | 48 |
| abstract_inverted_index.model | 60, 69 |
| abstract_inverted_index.often | 21 |
| abstract_inverted_index.prior | 159 |
| abstract_inverted_index.still | 20 |
| abstract_inverted_index.task- | 25 |
| abstract_inverted_index.tasks | 101 |
| abstract_inverted_index.text, | 64 |
| abstract_inverted_index.tower | 124 |
| abstract_inverted_index.work. | 160 |
| abstract_inverted_index.CLIPPO | 83, 98, 138, 162, 185 |
| abstract_inverted_index.almost | 109 |
| abstract_inverted_index.alone, | 75 |
| abstract_inverted_index.image, | 63 |
| abstract_inverted_index.images | 92 |
| abstract_inverted_index.masked | 154 |
| abstract_inverted_index.models | 1, 19 |
| abstract_inverted_index.number | 118 |
| abstract_inverted_index.obtain | 164 |
| abstract_inverted_index.pieces | 28 |
| abstract_inverted_index.simply | 171 |
| abstract_inverted_index.single | 86 |
| abstract_inverted_index.strong | 197 |
| abstract_inverted_index.tasks, | 146 |
| abstract_inverted_index.tasks. | 67 |
| abstract_inverted_index.towers | 44 |
| abstract_inverted_index.trains | 39 |
| abstract_inverted_index.visual | 168 |
| abstract_inverted_index.achieve | 196 |
| abstract_inverted_index.consist | 22 |
| abstract_inverted_index.encoder | 87 |
| abstract_inverted_index.exploit | 181 |
| abstract_inverted_index.explore | 50 |
| abstract_inverted_index.images. | 97 |
| abstract_inverted_index.jointly | 129 |
| abstract_inverted_index.models, | 114 |
| abstract_inverted_index.natural | 143 |
| abstract_inverted_index.perform | 62, 140 |
| abstract_inverted_index.regular | 91 |
| abstract_inverted_index.require | 188 |
| abstract_inverted_index.trained | 71, 128 |
| abstract_inverted_index.unified | 10 |
| abstract_inverted_index.without | 147, 203 |
| abstract_inverted_index.(Radford | 35 |
| abstract_inverted_index.Finally, | 179 |
| abstract_inverted_index.However, | 17 |
| abstract_inverted_index.accuracy | 166 |
| abstract_inverted_index.becoming | 3 |
| abstract_inverted_index.example, | 33 |
| abstract_inverted_index.language | 144, 155 |
| abstract_inverted_index.learning | 133 |
| abstract_inverted_index.performs | 99 |
| abstract_inverted_index.question | 169, 175 |
| abstract_inverted_index.rendered | 95 |
| abstract_inverted_index.training | 30 |
| abstract_inverted_index.(CLIPPO). | 82 |
| abstract_inverted_index.(language | 151 |
| abstract_inverted_index.learning, | 137 |
| abstract_inverted_index.modelling | 152 |
| abstract_inverted_index.processes | 89 |
| abstract_inverted_index.rendering | 173 |
| abstract_inverted_index.retrieval | 104, 202 |
| abstract_inverted_index.together. | 178 |
| abstract_inverted_index.tokenizer | 190 |
| abstract_inverted_index.zero-shot | 106 |
| abstract_inverted_index.CLIP-style | 113 |
| abstract_inverted_index.Multimodal | 0 |
| abstract_inverted_index.additional | 52 |
| abstract_inverted_index.answering, | 170 |
| abstract_inverted_index.effective, | 5 |
| abstract_inverted_index.embedding. | 126 |
| abstract_inverted_index.image-text | 131 |
| abstract_inverted_index.multimodal | 18, 66, 201 |
| abstract_inverted_index.parameters | 120 |
| abstract_inverted_index.word-level | 149 |
| abstract_inverted_index.CLIP-Pixels | 80 |
| abstract_inverted_index.Transformer | 15 |
| abstract_inverted_index.components, | 11 |
| abstract_inverted_index.contrastive | 47, 73, 132, 136 |
| abstract_inverted_index.image-based | 100 |
| abstract_inverted_index.independent | 40 |
| abstract_inverted_index.modelling), | 156 |
| abstract_inverted_index.performance | 198 |
| abstract_inverted_index.pixel-based | 59, 158 |
| abstract_inverted_index.procedures. | 31 |
| abstract_inverted_index.increasingly | 4 |
| abstract_inverted_index.multilingual | 200 |
| abstract_inverted_index.unification: | 53 |
| abstract_inverted_index.Surprisingly, | 161 |
| abstract_inverted_index.architecture. | 16 |
| abstract_inverted_index.next-sentence | 135 |
| abstract_inverted_index.outperforming | 157 |
| abstract_inverted_index.text-specific | 123 |
| abstract_inverted_index.understanding | 145 |
| abstract_inverted_index.classification | 108 |
| abstract_inverted_index.modifications. | 204 |
| abstract_inverted_index.modality-specific | 27 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.8399999737739563 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |