Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2210.03347
Visually-situated language is ubiquitous -- sources range from textbooks with diagrams to web pages with images and tables, to mobile apps with buttons and forms. Perhaps due to this diversity, previous work has typically relied on domain-specific recipes with limited sharing of the underlying data, model architectures, and objectives. We present Pix2Struct, a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing visually-situated language. Pix2Struct is pretrained by learning to parse masked screenshots of web pages into simplified HTML. The web, with its richness of visual elements cleanly reflected in the HTML structure, provides a large source of pretraining data well suited to the diversity of downstream tasks. Intuitively, this objective subsumes common pretraining signals such as OCR, language modeling, image captioning. In addition to the novel pretraining strategy, we introduce a variable-resolution input representation and a more flexible integration of language and vision inputs, where language prompts such as questions are rendered directly on top of the input image. For the first time, we show that a single pretrained model can achieve state-of-the-art results in six out of nine tasks across four domains: documents, illustrations, user interfaces, and natural images.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2210.03347
- https://arxiv.org/pdf/2210.03347
- OA Status
- green
- Cited By
- 45
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4304192731
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4304192731Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2210.03347Digital Object Identifier
- Title
-
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language UnderstandingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-10-07Full publication date if available
- Authors
-
Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Martin Eisenschlos, Urvashi Khandelwal, Peter J. Shaw, Ming‐Wei Chang, Kristina ToutanovaList of authors in order
- Landing page
-
https://arxiv.org/abs/2210.03347Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2210.03347Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2210.03347Direct OA link when available
- Concepts
-
Computer science, Parsing, Closed captioning, Natural language processing, Artificial intelligence, Natural language, Situated, Domain (mathematical analysis), Language model, Representation (politics), Human–computer interaction, Image (mathematics), Mathematics, Law, Political science, Politics, Mathematical analysisTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
45Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 5, 2024: 22, 2023: 18Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4304192731 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2210.03347 |
| ids.doi | https://doi.org/10.48550/arxiv.2210.03347 |
| ids.openalex | https://openalex.org/W4304192731 |
| fwci | |
| type | preprint |
| title | Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998000264167786 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T10181 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9878000020980835 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Natural Language Processing Techniques |
| topics[2].id | https://openalex.org/T10627 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9732000231742859 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Advanced Image and Video Retrieval Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8644065260887146 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C186644900 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7467710971832275 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q194152 |
| concepts[1].display_name | Parsing |
| concepts[2].id | https://openalex.org/C157657479 |
| concepts[2].level | 3 |
| concepts[2].score | 0.685272753238678 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q2367247 |
| concepts[2].display_name | Closed captioning |
| concepts[3].id | https://openalex.org/C204321447 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5966121554374695 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[3].display_name | Natural language processing |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.5181930661201477 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C195324797 |
| concepts[5].level | 2 |
| concepts[5].score | 0.500176191329956 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q33742 |
| concepts[5].display_name | Natural language |
| concepts[6].id | https://openalex.org/C132829578 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4910585880279541 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q581151 |
| concepts[6].display_name | Situated |
| concepts[7].id | https://openalex.org/C36503486 |
| concepts[7].level | 2 |
| concepts[7].score | 0.4809402823448181 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11235244 |
| concepts[7].display_name | Domain (mathematical analysis) |
| concepts[8].id | https://openalex.org/C137293760 |
| concepts[8].level | 2 |
| concepts[8].score | 0.47828564047813416 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[8].display_name | Language model |
| concepts[9].id | https://openalex.org/C2776359362 |
| concepts[9].level | 3 |
| concepts[9].score | 0.43287044763565063 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q2145286 |
| concepts[9].display_name | Representation (politics) |
| concepts[10].id | https://openalex.org/C107457646 |
| concepts[10].level | 1 |
| concepts[10].score | 0.3254523277282715 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q207434 |
| concepts[10].display_name | Human–computer interaction |
| concepts[11].id | https://openalex.org/C115961682 |
| concepts[11].level | 2 |
| concepts[11].score | 0.2771185040473938 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[11].display_name | Image (mathematics) |
| concepts[12].id | https://openalex.org/C33923547 |
| concepts[12].level | 0 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[12].display_name | Mathematics |
| concepts[13].id | https://openalex.org/C199539241 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q7748 |
| concepts[13].display_name | Law |
| concepts[14].id | https://openalex.org/C17744445 |
| concepts[14].level | 0 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q36442 |
| concepts[14].display_name | Political science |
| concepts[15].id | https://openalex.org/C94625758 |
| concepts[15].level | 2 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q7163 |
| concepts[15].display_name | Politics |
| concepts[16].id | https://openalex.org/C134306372 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[16].display_name | Mathematical analysis |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8644065260887146 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/parsing |
| keywords[1].score | 0.7467710971832275 |
| keywords[1].display_name | Parsing |
| keywords[2].id | https://openalex.org/keywords/closed-captioning |
| keywords[2].score | 0.685272753238678 |
| keywords[2].display_name | Closed captioning |
| keywords[3].id | https://openalex.org/keywords/natural-language-processing |
| keywords[3].score | 0.5966121554374695 |
| keywords[3].display_name | Natural language processing |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.5181930661201477 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/natural-language |
| keywords[5].score | 0.500176191329956 |
| keywords[5].display_name | Natural language |
| keywords[6].id | https://openalex.org/keywords/situated |
| keywords[6].score | 0.4910585880279541 |
| keywords[6].display_name | Situated |
| keywords[7].id | https://openalex.org/keywords/domain |
| keywords[7].score | 0.4809402823448181 |
| keywords[7].display_name | Domain (mathematical analysis) |
| keywords[8].id | https://openalex.org/keywords/language-model |
| keywords[8].score | 0.47828564047813416 |
| keywords[8].display_name | Language model |
| keywords[9].id | https://openalex.org/keywords/representation |
| keywords[9].score | 0.43287044763565063 |
| keywords[9].display_name | Representation (politics) |
| keywords[10].id | https://openalex.org/keywords/human–computer-interaction |
| keywords[10].score | 0.3254523277282715 |
| keywords[10].display_name | Human–computer interaction |
| keywords[11].id | https://openalex.org/keywords/image |
| keywords[11].score | 0.2771185040473938 |
| keywords[11].display_name | Image (mathematics) |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2210.03347 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2210.03347 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2210.03347 |
| locations[1].id | doi:10.48550/arxiv.2210.03347 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2210.03347 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5081862885 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-9534-5970 |
| authorships[0].author.display_name | Kenton Lee |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Lee, Kenton |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5108202364 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Mandar Joshi |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Joshi, Mandar |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5086294822 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Iulia Turc |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Turc, Iulia |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5065708799 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-4720-169X |
| authorships[3].author.display_name | Hexiang Hu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Hu, Hexiang |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5026154387 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-7038-3623 |
| authorships[4].author.display_name | Fangyu Liu |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Liu, Fangyu |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5000738730 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Julian Martin Eisenschlos |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Eisenschlos, Julian |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5088072227 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Urvashi Khandelwal |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Khandelwal, Urvashi |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5061827062 |
| authorships[7].author.orcid | https://orcid.org/0000-0003-0101-4482 |
| authorships[7].author.display_name | Peter J. Shaw |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Shaw, Peter |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5076904467 |
| authorships[8].author.orcid | https://orcid.org/0000-0002-0137-8895 |
| authorships[8].author.display_name | Ming‐Wei Chang |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Chang, Ming-Wei |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5053947885 |
| authorships[9].author.orcid | |
| authorships[9].author.display_name | Kristina Toutanova |
| authorships[9].author_position | last |
| authorships[9].raw_author_name | Toutanova, Kristina |
| authorships[9].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2210.03347 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998000264167786 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W4210416330, https://openalex.org/W2775506363, https://openalex.org/W3088136942, https://openalex.org/W4290852288, https://openalex.org/W2949362007, https://openalex.org/W4386271066, https://openalex.org/W3009270862, https://openalex.org/W2066060456, https://openalex.org/W2293063786, https://openalex.org/W2911292476 |
| cited_by_count | 45 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 5 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 22 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 18 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2210.03347 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2210.03347 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2210.03347 |
| primary_location.id | pmh:oai:arXiv.org:2210.03347 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2210.03347 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2210.03347 |
| publication_date | 2022-10-07 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 52, 100, 137, 142, 173 |
| abstract_inverted_index.-- | 4 |
| abstract_inverted_index.In | 128 |
| abstract_inverted_index.We | 49 |
| abstract_inverted_index.as | 122, 155 |
| abstract_inverted_index.be | 63 |
| abstract_inverted_index.by | 73 |
| abstract_inverted_index.in | 95, 181 |
| abstract_inverted_index.is | 2, 71 |
| abstract_inverted_index.of | 41, 79, 90, 103, 111, 146, 162, 184 |
| abstract_inverted_index.on | 35, 65, 160 |
| abstract_inverted_index.to | 11, 18, 27, 75, 108, 130 |
| abstract_inverted_index.we | 135, 170 |
| abstract_inverted_index.For | 166 |
| abstract_inverted_index.The | 85 |
| abstract_inverted_index.and | 16, 23, 47, 141, 148, 194 |
| abstract_inverted_index.are | 157 |
| abstract_inverted_index.can | 62, 177 |
| abstract_inverted_index.due | 26 |
| abstract_inverted_index.for | 56 |
| abstract_inverted_index.has | 32 |
| abstract_inverted_index.its | 88 |
| abstract_inverted_index.out | 183 |
| abstract_inverted_index.six | 182 |
| abstract_inverted_index.the | 42, 96, 109, 131, 163, 167 |
| abstract_inverted_index.top | 161 |
| abstract_inverted_index.web | 12, 80 |
| abstract_inverted_index.HTML | 97 |
| abstract_inverted_index.OCR, | 123 |
| abstract_inverted_index.apps | 20 |
| abstract_inverted_index.data | 105 |
| abstract_inverted_index.four | 188 |
| abstract_inverted_index.from | 7 |
| abstract_inverted_index.into | 82 |
| abstract_inverted_index.more | 143 |
| abstract_inverted_index.nine | 185 |
| abstract_inverted_index.show | 171 |
| abstract_inverted_index.such | 121, 154 |
| abstract_inverted_index.that | 172 |
| abstract_inverted_index.this | 28, 115 |
| abstract_inverted_index.user | 192 |
| abstract_inverted_index.web, | 86 |
| abstract_inverted_index.well | 106 |
| abstract_inverted_index.with | 9, 14, 21, 38, 87 |
| abstract_inverted_index.work | 31 |
| abstract_inverted_index.HTML. | 84 |
| abstract_inverted_index.data, | 44 |
| abstract_inverted_index.first | 168 |
| abstract_inverted_index.image | 126 |
| abstract_inverted_index.input | 139, 164 |
| abstract_inverted_index.large | 101 |
| abstract_inverted_index.model | 45, 55, 176 |
| abstract_inverted_index.novel | 132 |
| abstract_inverted_index.pages | 13, 81 |
| abstract_inverted_index.parse | 76 |
| abstract_inverted_index.range | 6 |
| abstract_inverted_index.tasks | 66, 186 |
| abstract_inverted_index.time, | 169 |
| abstract_inverted_index.where | 151 |
| abstract_inverted_index.which | 61 |
| abstract_inverted_index.across | 187 |
| abstract_inverted_index.common | 118 |
| abstract_inverted_index.forms. | 24 |
| abstract_inverted_index.image. | 165 |
| abstract_inverted_index.images | 15 |
| abstract_inverted_index.masked | 77 |
| abstract_inverted_index.mobile | 19 |
| abstract_inverted_index.purely | 57 |
| abstract_inverted_index.relied | 34 |
| abstract_inverted_index.single | 174 |
| abstract_inverted_index.source | 102 |
| abstract_inverted_index.suited | 107 |
| abstract_inverted_index.tasks. | 113 |
| abstract_inverted_index.vision | 149 |
| abstract_inverted_index.visual | 58, 91 |
| abstract_inverted_index.Perhaps | 25 |
| abstract_inverted_index.achieve | 178 |
| abstract_inverted_index.buttons | 22 |
| abstract_inverted_index.cleanly | 93 |
| abstract_inverted_index.images. | 196 |
| abstract_inverted_index.inputs, | 150 |
| abstract_inverted_index.limited | 39 |
| abstract_inverted_index.natural | 195 |
| abstract_inverted_index.present | 50 |
| abstract_inverted_index.prompts | 153 |
| abstract_inverted_index.recipes | 37 |
| abstract_inverted_index.results | 180 |
| abstract_inverted_index.sharing | 40 |
| abstract_inverted_index.signals | 120 |
| abstract_inverted_index.sources | 5 |
| abstract_inverted_index.tables, | 17 |
| abstract_inverted_index.addition | 129 |
| abstract_inverted_index.diagrams | 10 |
| abstract_inverted_index.directly | 159 |
| abstract_inverted_index.domains: | 189 |
| abstract_inverted_index.elements | 92 |
| abstract_inverted_index.flexible | 144 |
| abstract_inverted_index.language | 1, 59, 124, 147, 152 |
| abstract_inverted_index.learning | 74 |
| abstract_inverted_index.previous | 30 |
| abstract_inverted_index.provides | 99 |
| abstract_inverted_index.rendered | 158 |
| abstract_inverted_index.richness | 89 |
| abstract_inverted_index.subsumes | 117 |
| abstract_inverted_index.diversity | 110 |
| abstract_inverted_index.finetuned | 64 |
| abstract_inverted_index.introduce | 136 |
| abstract_inverted_index.language. | 69 |
| abstract_inverted_index.modeling, | 125 |
| abstract_inverted_index.objective | 116 |
| abstract_inverted_index.questions | 156 |
| abstract_inverted_index.reflected | 94 |
| abstract_inverted_index.strategy, | 134 |
| abstract_inverted_index.textbooks | 8 |
| abstract_inverted_index.typically | 33 |
| abstract_inverted_index.Pix2Struct | 70 |
| abstract_inverted_index.containing | 67 |
| abstract_inverted_index.diversity, | 29 |
| abstract_inverted_index.documents, | 190 |
| abstract_inverted_index.downstream | 112 |
| abstract_inverted_index.pretrained | 53, 72, 175 |
| abstract_inverted_index.simplified | 83 |
| abstract_inverted_index.structure, | 98 |
| abstract_inverted_index.ubiquitous | 3 |
| abstract_inverted_index.underlying | 43 |
| abstract_inverted_index.Pix2Struct, | 51 |
| abstract_inverted_index.captioning. | 127 |
| abstract_inverted_index.integration | 145 |
| abstract_inverted_index.interfaces, | 193 |
| abstract_inverted_index.objectives. | 48 |
| abstract_inverted_index.pretraining | 104, 119, 133 |
| abstract_inverted_index.screenshots | 78 |
| abstract_inverted_index.Intuitively, | 114 |
| abstract_inverted_index.image-to-text | 54 |
| abstract_inverted_index.architectures, | 46 |
| abstract_inverted_index.illustrations, | 191 |
| abstract_inverted_index.representation | 140 |
| abstract_inverted_index.understanding, | 60 |
| abstract_inverted_index.domain-specific | 36 |
| abstract_inverted_index.state-of-the-art | 179 |
| abstract_inverted_index.Visually-situated | 0 |
| abstract_inverted_index.visually-situated | 68 |
| abstract_inverted_index.variable-resolution | 138 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 10 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.8100000023841858 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |