Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2409.12191
We present the Qwen2-VL Series, an advanced upgrade of the previous Qwen-VL models that redefines the conventional predetermined-resolution approach in visual processing. Qwen2-VL introduces the Naive Dynamic Resolution mechanism, which enables the model to dynamically process images of varying resolutions into different numbers of visual tokens. This approach allows the model to generate more efficient and accurate visual representations, closely aligning with human perceptual processes. The model also integrates Multimodal Rotary Position Embedding (M-RoPE), facilitating the effective fusion of positional information across text, images, and videos. We employ a unified paradigm for processing both images and videos, enhancing the model's visual perception capabilities. To explore the potential of large multimodal models, Qwen2-VL investigates the scaling laws for large vision-language models (LVLMs). By scaling both the model size-with versions at 2B, 8B, and 72B parameters-and the amount of training data, the Qwen2-VL Series achieves highly competitive performance. Notably, the Qwen2-VL-72B model achieves results comparable to leading models such as GPT-4o and Claude3.5-Sonnet across various multimodal benchmarks, outperforming other generalist models. Code is available at https://github.com/QwenLM/Qwen2-VL .
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2409.12191
- https://arxiv.org/pdf/2409.12191
- OA Status
- green
- Cited By
- 63
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4403853618
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4403853618Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2409.12191Digital Object Identifier
- Title
-
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any ResolutionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-09-18Full publication date if available
- Authors
-
Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Ge Wen-bin, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, Junyang LinList of authors in order
- Landing page
-
https://arxiv.org/abs/2409.12191Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2409.12191Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2409.12191Direct OA link when available
- Concepts
-
Perception, Resolution (logic), Computer science, Artificial intelligence, Computer vision, Psychology, NeuroscienceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
63Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 59, 2024: 4Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4403853618 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2409.12191 |
| ids.doi | https://doi.org/10.48550/arxiv.2409.12191 |
| ids.openalex | https://openalex.org/W4403853618 |
| fwci | |
| type | preprint |
| title | Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12694 |
| topics[0].field.id | https://openalex.org/fields/32 |
| topics[0].field.display_name | Psychology |
| topics[0].score | 0.4867999851703644 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/3205 |
| topics[0].subfield.display_name | Experimental and Cognitive Psychology |
| topics[0].display_name | Categorization, perception, and language |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C26760741 |
| concepts[0].level | 2 |
| concepts[0].score | 0.5822054743766785 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q160402 |
| concepts[0].display_name | Perception |
| concepts[1].id | https://openalex.org/C138268822 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5793058276176453 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q1051925 |
| concepts[1].display_name | Resolution (logic) |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.4538711905479431 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.4362286925315857 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C31972630 |
| concepts[4].level | 1 |
| concepts[4].score | 0.36828944087028503 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[4].display_name | Computer vision |
| concepts[5].id | https://openalex.org/C15744967 |
| concepts[5].level | 0 |
| concepts[5].score | 0.2694791555404663 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[5].display_name | Psychology |
| concepts[6].id | https://openalex.org/C169760540 |
| concepts[6].level | 1 |
| concepts[6].score | 0.05089372396469116 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q207011 |
| concepts[6].display_name | Neuroscience |
| keywords[0].id | https://openalex.org/keywords/perception |
| keywords[0].score | 0.5822054743766785 |
| keywords[0].display_name | Perception |
| keywords[1].id | https://openalex.org/keywords/resolution |
| keywords[1].score | 0.5793058276176453 |
| keywords[1].display_name | Resolution (logic) |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.4538711905479431 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.4362286925315857 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/computer-vision |
| keywords[4].score | 0.36828944087028503 |
| keywords[4].display_name | Computer vision |
| keywords[5].id | https://openalex.org/keywords/psychology |
| keywords[5].score | 0.2694791555404663 |
| keywords[5].display_name | Psychology |
| keywords[6].id | https://openalex.org/keywords/neuroscience |
| keywords[6].score | 0.05089372396469116 |
| keywords[6].display_name | Neuroscience |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2409.12191 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2409.12191 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2409.12191 |
| locations[1].id | doi:10.48550/arxiv.2409.12191 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2409.12191 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5058176560 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-8782-857X |
| authorships[0].author.display_name | Peng Wang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Wang, Peng |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101014470 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-6896-8590 |
| authorships[1].author.display_name | Shuai Bai |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Bai, Shuai |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5008661936 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-2035-2479 |
| authorships[2].author.display_name | Sinan Tan |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Tan, Sinan |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5106406632 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Shijie Wang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Wang, Shijie |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5100313851 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Zhihao Fan |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Fan, Zhihao |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5063334231 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Jinze Bai |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Bai, Jinze |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5030987813 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-9091-8258 |
| authorships[6].author.display_name | Keqin Chen |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Chen, Keqin |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5101944061 |
| authorships[7].author.orcid | https://orcid.org/0000-0001-9612-3707 |
| authorships[7].author.display_name | Xuejing Liu |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Liu, Xuejing |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5100430890 |
| authorships[8].author.orcid | https://orcid.org/0000-0001-5985-9061 |
| authorships[8].author.display_name | Jialin Wang |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Wang, Jialin |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5005725445 |
| authorships[9].author.orcid | |
| authorships[9].author.display_name | Ge Wen-bin |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Ge, Wenbin |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5101520404 |
| authorships[10].author.orcid | https://orcid.org/0000-0001-8875-6686 |
| authorships[10].author.display_name | Yang Fan |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Fan, Yang |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5014400759 |
| authorships[11].author.orcid | |
| authorships[11].author.display_name | Kai Dang |
| authorships[11].author_position | middle |
| authorships[11].raw_author_name | Dang, Kai |
| authorships[11].is_corresponding | False |
| authorships[12].author.id | https://openalex.org/A5019333214 |
| authorships[12].author.orcid | |
| authorships[12].author.display_name | Mengfei Du |
| authorships[12].author_position | middle |
| authorships[12].raw_author_name | Du, Mengfei |
| authorships[12].is_corresponding | False |
| authorships[13].author.id | https://openalex.org/A5049239373 |
| authorships[13].author.orcid | https://orcid.org/0000-0002-6994-2114 |
| authorships[13].author.display_name | Xuancheng Ren |
| authorships[13].author_position | middle |
| authorships[13].raw_author_name | Ren, Xuancheng |
| authorships[13].is_corresponding | False |
| authorships[14].author.id | https://openalex.org/A5004626105 |
| authorships[14].author.orcid | https://orcid.org/0000-0002-4429-3461 |
| authorships[14].author.display_name | Rui Men |
| authorships[14].author_position | middle |
| authorships[14].raw_author_name | Men, Rui |
| authorships[14].is_corresponding | False |
| authorships[15].author.id | https://openalex.org/A5062188134 |
| authorships[15].author.orcid | https://orcid.org/0000-0002-8755-8941 |
| authorships[15].author.display_name | Dayiheng Liu |
| authorships[15].author_position | middle |
| authorships[15].raw_author_name | Liu, Dayiheng |
| authorships[15].is_corresponding | False |
| authorships[16].author.id | https://openalex.org/A5091103295 |
| authorships[16].author.orcid | https://orcid.org/0000-0002-3744-2940 |
| authorships[16].author.display_name | Chang Zhou |
| authorships[16].author_position | middle |
| authorships[16].raw_author_name | Zhou, Chang |
| authorships[16].is_corresponding | False |
| authorships[17].author.id | https://openalex.org/A5113621558 |
| authorships[17].author.orcid | |
| authorships[17].author.display_name | Jingren Zhou |
| authorships[17].author_position | middle |
| authorships[17].raw_author_name | Zhou, Jingren |
| authorships[17].is_corresponding | False |
| authorships[18].author.id | https://openalex.org/A5100612233 |
| authorships[18].author.orcid | https://orcid.org/0000-0001-9931-383X |
| authorships[18].author.display_name | Junyang Lin |
| authorships[18].author_position | last |
| authorships[18].raw_author_name | Lin, Junyang |
| authorships[18].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2409.12191 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12694 |
| primary_topic.field.id | https://openalex.org/fields/32 |
| primary_topic.field.display_name | Psychology |
| primary_topic.score | 0.4867999851703644 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/3205 |
| primary_topic.subfield.display_name | Experimental and Cognitive Psychology |
| primary_topic.display_name | Categorization, perception, and language |
| related_works | https://openalex.org/W2772917594, https://openalex.org/W2036807459, https://openalex.org/W2058170566, https://openalex.org/W2755342338, https://openalex.org/W2166024367, https://openalex.org/W3116076068, https://openalex.org/W2229312674, https://openalex.org/W2951359407, https://openalex.org/W2079911747, https://openalex.org/W1969923398 |
| cited_by_count | 63 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 59 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 4 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2409.12191 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2409.12191 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2409.12191 |
| primary_location.id | pmh:oai:arXiv.org:2409.12191 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2409.12191 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2409.12191 |
| publication_date | 2024-09-18 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.. | 174 |
| abstract_inverted_index.a | 88 |
| abstract_inverted_index.By | 121 |
| abstract_inverted_index.To | 103 |
| abstract_inverted_index.We | 0, 86 |
| abstract_inverted_index.an | 5 |
| abstract_inverted_index.as | 157 |
| abstract_inverted_index.at | 128, 172 |
| abstract_inverted_index.in | 19 |
| abstract_inverted_index.is | 170 |
| abstract_inverted_index.of | 8, 37, 43, 78, 107, 136 |
| abstract_inverted_index.to | 33, 51, 153 |
| abstract_inverted_index.2B, | 129 |
| abstract_inverted_index.72B | 132 |
| abstract_inverted_index.8B, | 130 |
| abstract_inverted_index.The | 65 |
| abstract_inverted_index.and | 55, 84, 95, 131, 159 |
| abstract_inverted_index.for | 91, 116 |
| abstract_inverted_index.the | 2, 9, 15, 24, 31, 49, 75, 98, 105, 113, 124, 134, 139, 147 |
| abstract_inverted_index.Code | 169 |
| abstract_inverted_index.This | 46 |
| abstract_inverted_index.also | 67 |
| abstract_inverted_index.both | 93, 123 |
| abstract_inverted_index.into | 40 |
| abstract_inverted_index.laws | 115 |
| abstract_inverted_index.more | 53 |
| abstract_inverted_index.such | 156 |
| abstract_inverted_index.that | 13 |
| abstract_inverted_index.with | 61 |
| abstract_inverted_index.Naive | 25 |
| abstract_inverted_index.data, | 138 |
| abstract_inverted_index.human | 62 |
| abstract_inverted_index.large | 108, 117 |
| abstract_inverted_index.model | 32, 50, 66, 125, 149 |
| abstract_inverted_index.other | 166 |
| abstract_inverted_index.text, | 82 |
| abstract_inverted_index.which | 29 |
| abstract_inverted_index.GPT-4o | 158 |
| abstract_inverted_index.Rotary | 70 |
| abstract_inverted_index.Series | 141 |
| abstract_inverted_index.across | 81, 161 |
| abstract_inverted_index.allows | 48 |
| abstract_inverted_index.amount | 135 |
| abstract_inverted_index.employ | 87 |
| abstract_inverted_index.fusion | 77 |
| abstract_inverted_index.highly | 143 |
| abstract_inverted_index.images | 36, 94 |
| abstract_inverted_index.models | 12, 119, 155 |
| abstract_inverted_index.visual | 20, 44, 57, 100 |
| abstract_inverted_index.Dynamic | 26 |
| abstract_inverted_index.Qwen-VL | 11 |
| abstract_inverted_index.Series, | 4 |
| abstract_inverted_index.closely | 59 |
| abstract_inverted_index.enables | 30 |
| abstract_inverted_index.explore | 104 |
| abstract_inverted_index.images, | 83 |
| abstract_inverted_index.leading | 154 |
| abstract_inverted_index.model's | 99 |
| abstract_inverted_index.models, | 110 |
| abstract_inverted_index.models. | 168 |
| abstract_inverted_index.numbers | 42 |
| abstract_inverted_index.present | 1 |
| abstract_inverted_index.process | 35 |
| abstract_inverted_index.results | 151 |
| abstract_inverted_index.scaling | 114, 122 |
| abstract_inverted_index.tokens. | 45 |
| abstract_inverted_index.unified | 89 |
| abstract_inverted_index.upgrade | 7 |
| abstract_inverted_index.various | 162 |
| abstract_inverted_index.varying | 38 |
| abstract_inverted_index.videos, | 96 |
| abstract_inverted_index.videos. | 85 |
| abstract_inverted_index.(LVLMs). | 120 |
| abstract_inverted_index.Notably, | 146 |
| abstract_inverted_index.Position | 71 |
| abstract_inverted_index.Qwen2-VL | 3, 22, 111, 140 |
| abstract_inverted_index.accurate | 56 |
| abstract_inverted_index.achieves | 142, 150 |
| abstract_inverted_index.advanced | 6 |
| abstract_inverted_index.aligning | 60 |
| abstract_inverted_index.approach | 18, 47 |
| abstract_inverted_index.generate | 52 |
| abstract_inverted_index.paradigm | 90 |
| abstract_inverted_index.previous | 10 |
| abstract_inverted_index.training | 137 |
| abstract_inverted_index.versions | 127 |
| abstract_inverted_index.(M-RoPE), | 73 |
| abstract_inverted_index.Embedding | 72 |
| abstract_inverted_index.available | 171 |
| abstract_inverted_index.different | 41 |
| abstract_inverted_index.effective | 76 |
| abstract_inverted_index.efficient | 54 |
| abstract_inverted_index.enhancing | 97 |
| abstract_inverted_index.potential | 106 |
| abstract_inverted_index.redefines | 14 |
| abstract_inverted_index.size-with | 126 |
| abstract_inverted_index.Multimodal | 69 |
| abstract_inverted_index.Resolution | 27 |
| abstract_inverted_index.comparable | 152 |
| abstract_inverted_index.generalist | 167 |
| abstract_inverted_index.integrates | 68 |
| abstract_inverted_index.introduces | 23 |
| abstract_inverted_index.mechanism, | 28 |
| abstract_inverted_index.multimodal | 109, 163 |
| abstract_inverted_index.perception | 101 |
| abstract_inverted_index.perceptual | 63 |
| abstract_inverted_index.positional | 79 |
| abstract_inverted_index.processes. | 64 |
| abstract_inverted_index.processing | 92 |
| abstract_inverted_index.benchmarks, | 164 |
| abstract_inverted_index.competitive | 144 |
| abstract_inverted_index.dynamically | 34 |
| abstract_inverted_index.information | 80 |
| abstract_inverted_index.processing. | 21 |
| abstract_inverted_index.resolutions | 39 |
| abstract_inverted_index.Qwen2-VL-72B | 148 |
| abstract_inverted_index.conventional | 16 |
| abstract_inverted_index.facilitating | 74 |
| abstract_inverted_index.investigates | 112 |
| abstract_inverted_index.performance. | 145 |
| abstract_inverted_index.capabilities. | 102 |
| abstract_inverted_index.outperforming | 165 |
| abstract_inverted_index.parameters-and | 133 |
| abstract_inverted_index.vision-language | 118 |
| abstract_inverted_index.Claude3.5-Sonnet | 160 |
| abstract_inverted_index.representations, | 58 |
| abstract_inverted_index.predetermined-resolution | 17 |
| abstract_inverted_index.https://github.com/QwenLM/Qwen2-VL | 173 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 19 |
| citation_normalized_percentile |