Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2509.09731
Chinese ancient documents, invaluable carriers of millennia of Chinese history and culture, hold rich knowledge across diverse fields but face challenges in digitization and understanding, i.e., traditional methods only scan images, while current Vision-Language Models (VLMs) struggle with their visual and linguistic complexity. Existing document benchmarks focus on English printed texts or simplified Chinese, leaving a gap for evaluating VLMs on ancient Chinese documents. To address this, we present AncientDoc, the first benchmark for Chinese ancient documents, designed to assess VLMs from OCR to knowledge reasoning. AncientDoc includes five tasks (page-level OCR, vernacular translation, reasoning-based QA, knowledge-based QA, linguistic variant QA) and covers 14 document types, over 100 books, and about 3,000 pages. Based on AncientDoc, we evaluate mainstream VLMs using multiple metrics, supplemented by a human-aligned large language model for scoring.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2509.09731
- https://arxiv.org/pdf/2509.09731
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4414591483
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4414591483Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2509.09731Digital Object Identifier
- Title
-
Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge ReasoningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-09-10Full publication date if available
- Authors
-
Haiyang Yu, Yuchuan Wu, Fan Shi, Lei Liao, Jinghui Lu, Xiaodong Ge, Han Wang, Minghan Zhuo, Xuecheng Wu, Xiang Fei, Hao Feng, Guozhi Tang, An-Lan Wang, Hanshen Zhu, Yangfan He, Quanhuan Liang, Liyuan Meng, Chao Feng, Can Huang, Jingqun Tang, Bin LiList of authors in order
- Landing page
-
https://arxiv.org/abs/2509.09731Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2509.09731Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2509.09731Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4414591483 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2509.09731 |
| ids.doi | https://doi.org/10.48550/arxiv.2509.09731 |
| ids.openalex | https://openalex.org/W4414591483 |
| fwci | |
| type | preprint |
| title | Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11148 |
| topics[0].field.id | https://openalex.org/fields/32 |
| topics[0].field.display_name | Psychology |
| topics[0].score | 0.8411999940872192 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/3205 |
| topics[0].subfield.display_name | Experimental and Cognitive Psychology |
| topics[0].display_name | Language, Metaphor, and Cognition |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2509.09731 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2509.09731 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2509.09731 |
| locations[1].id | doi:10.48550/arxiv.2509.09731 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2509.09731 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101650550 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Haiyang Yu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Yu, Haiyang |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5102918313 |
| authorships[1].author.orcid | https://orcid.org/0009-0000-3657-5859 |
| authorships[1].author.display_name | Yuchuan Wu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wu, Yuchuan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5052376376 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-9329-077X |
| authorships[2].author.display_name | Fan Shi |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Shi, Fan |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5036022527 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-1325-2410 |
| authorships[3].author.display_name | Lei Liao |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Liao, Lei |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5050212540 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-7149-6961 |
| authorships[4].author.display_name | Jinghui Lu |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Lu, Jinghui |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5061838211 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-4796-0223 |
| authorships[5].author.display_name | Xiaodong Ge |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Ge, Xiaodong |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5100452675 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-1320-0947 |
| authorships[6].author.display_name | Han Wang |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Wang, Han |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5119757718 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | Minghan Zhuo |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Zhuo, Minghan |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5053734709 |
| authorships[8].author.orcid | https://orcid.org/0000-0001-9897-8776 |
| authorships[8].author.display_name | Xuecheng Wu |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Wu, Xuecheng |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5100660005 |
| authorships[9].author.orcid | https://orcid.org/0000-0001-8271-1815 |
| authorships[9].author.display_name | Xiang Fei |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Fei, Xiang |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5005177314 |
| authorships[10].author.orcid | https://orcid.org/0000-0002-0899-0438 |
| authorships[10].author.display_name | Hao Feng |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Feng, Hao |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5107479624 |
| authorships[11].author.orcid | |
| authorships[11].author.display_name | Guozhi Tang |
| authorships[11].author_position | middle |
| authorships[11].raw_author_name | Tang, Guozhi |
| authorships[11].is_corresponding | False |
| authorships[12].author.id | https://openalex.org/A5048679011 |
| authorships[12].author.orcid | |
| authorships[12].author.display_name | An-Lan Wang |
| authorships[12].author_position | middle |
| authorships[12].raw_author_name | Wang, An-Lan |
| authorships[12].is_corresponding | False |
| authorships[13].author.id | https://openalex.org/A5108777130 |
| authorships[13].author.orcid | |
| authorships[13].author.display_name | Hanshen Zhu |
| authorships[13].author_position | middle |
| authorships[13].raw_author_name | Zhu, Hanshen |
| authorships[13].is_corresponding | False |
| authorships[14].author.id | https://openalex.org/A5100642695 |
| authorships[14].author.orcid | https://orcid.org/0000-0002-3153-5177 |
| authorships[14].author.display_name | Yangfan He |
| authorships[14].author_position | middle |
| authorships[14].raw_author_name | He, Yangfan |
| authorships[14].is_corresponding | False |
| authorships[15].author.id | https://openalex.org/A5080142698 |
| authorships[15].author.orcid | https://orcid.org/0000-0002-1042-1795 |
| authorships[15].author.display_name | Quanhuan Liang |
| authorships[15].author_position | middle |
| authorships[15].raw_author_name | Liang, Quanhuan |
| authorships[15].is_corresponding | False |
| authorships[16].author.id | https://openalex.org/A5068765373 |
| authorships[16].author.orcid | |
| authorships[16].author.display_name | Liyuan Meng |
| authorships[16].author_position | middle |
| authorships[16].raw_author_name | Meng, Liyuan |
| authorships[16].is_corresponding | False |
| authorships[17].author.id | https://openalex.org/A5103124726 |
| authorships[17].author.orcid | https://orcid.org/0000-0003-3636-7210 |
| authorships[17].author.display_name | Chao Feng |
| authorships[17].author_position | middle |
| authorships[17].raw_author_name | Feng, Chao |
| authorships[17].is_corresponding | False |
| authorships[18].author.id | https://openalex.org/A5059214587 |
| authorships[18].author.orcid | https://orcid.org/0000-0001-8167-6114 |
| authorships[18].author.display_name | Can Huang |
| authorships[18].author_position | middle |
| authorships[18].raw_author_name | Huang, Can |
| authorships[18].is_corresponding | False |
| authorships[19].author.id | https://openalex.org/A5100681719 |
| authorships[19].author.orcid | https://orcid.org/0000-0003-2577-0119 |
| authorships[19].author.display_name | Jingqun Tang |
| authorships[19].author_position | middle |
| authorships[19].raw_author_name | Tang, Jingqun |
| authorships[19].is_corresponding | False |
| authorships[20].author.id | https://openalex.org/A5100365206 |
| authorships[20].author.orcid | https://orcid.org/0000-0002-7328-9947 |
| authorships[20].author.display_name | Bin Li |
| authorships[20].author_position | last |
| authorships[20].raw_author_name | Li, Bin |
| authorships[20].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2509.09731 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11148 |
| primary_topic.field.id | https://openalex.org/fields/32 |
| primary_topic.field.display_name | Psychology |
| primary_topic.score | 0.8411999940872192 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/3205 |
| primary_topic.subfield.display_name | Experimental and Cognitive Psychology |
| primary_topic.display_name | Language, Metaphor, and Cognition |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2509.09731 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2509.09731 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2509.09731 |
| primary_location.id | pmh:oai:arXiv.org:2509.09731 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2509.09731 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2509.09731 |
| publication_date | 2025-09-10 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 55, 125 |
| abstract_inverted_index.14 | 103 |
| abstract_inverted_index.To | 64 |
| abstract_inverted_index.by | 124 |
| abstract_inverted_index.in | 21 |
| abstract_inverted_index.of | 5, 7 |
| abstract_inverted_index.on | 47, 60, 114 |
| abstract_inverted_index.or | 51 |
| abstract_inverted_index.to | 78, 83 |
| abstract_inverted_index.we | 67, 116 |
| abstract_inverted_index.100 | 107 |
| abstract_inverted_index.OCR | 82 |
| abstract_inverted_index.QA) | 100 |
| abstract_inverted_index.QA, | 95, 97 |
| abstract_inverted_index.and | 10, 23, 40, 101, 109 |
| abstract_inverted_index.but | 18 |
| abstract_inverted_index.for | 57, 73, 130 |
| abstract_inverted_index.gap | 56 |
| abstract_inverted_index.the | 70 |
| abstract_inverted_index.OCR, | 91 |
| abstract_inverted_index.VLMs | 59, 80, 119 |
| abstract_inverted_index.face | 19 |
| abstract_inverted_index.five | 88 |
| abstract_inverted_index.from | 81 |
| abstract_inverted_index.hold | 12 |
| abstract_inverted_index.only | 28 |
| abstract_inverted_index.over | 106 |
| abstract_inverted_index.rich | 13 |
| abstract_inverted_index.scan | 29 |
| abstract_inverted_index.with | 37 |
| abstract_inverted_index.3,000 | 111 |
| abstract_inverted_index.Based | 113 |
| abstract_inverted_index.about | 110 |
| abstract_inverted_index.first | 71 |
| abstract_inverted_index.focus | 46 |
| abstract_inverted_index.i.e., | 25 |
| abstract_inverted_index.large | 127 |
| abstract_inverted_index.model | 129 |
| abstract_inverted_index.tasks | 89 |
| abstract_inverted_index.texts | 50 |
| abstract_inverted_index.their | 38 |
| abstract_inverted_index.this, | 66 |
| abstract_inverted_index.using | 120 |
| abstract_inverted_index.while | 31 |
| abstract_inverted_index.(VLMs) | 35 |
| abstract_inverted_index.Models | 34 |
| abstract_inverted_index.across | 15 |
| abstract_inverted_index.assess | 79 |
| abstract_inverted_index.books, | 108 |
| abstract_inverted_index.covers | 102 |
| abstract_inverted_index.fields | 17 |
| abstract_inverted_index.pages. | 112 |
| abstract_inverted_index.types, | 105 |
| abstract_inverted_index.visual | 39 |
| abstract_inverted_index.Chinese | 0, 8, 62, 74 |
| abstract_inverted_index.English | 48 |
| abstract_inverted_index.address | 65 |
| abstract_inverted_index.ancient | 1, 61, 75 |
| abstract_inverted_index.current | 32 |
| abstract_inverted_index.diverse | 16 |
| abstract_inverted_index.history | 9 |
| abstract_inverted_index.images, | 30 |
| abstract_inverted_index.leaving | 54 |
| abstract_inverted_index.methods | 27 |
| abstract_inverted_index.present | 68 |
| abstract_inverted_index.printed | 49 |
| abstract_inverted_index.variant | 99 |
| abstract_inverted_index.Chinese, | 53 |
| abstract_inverted_index.Existing | 43 |
| abstract_inverted_index.carriers | 4 |
| abstract_inverted_index.culture, | 11 |
| abstract_inverted_index.designed | 77 |
| abstract_inverted_index.document | 44, 104 |
| abstract_inverted_index.evaluate | 117 |
| abstract_inverted_index.includes | 87 |
| abstract_inverted_index.language | 128 |
| abstract_inverted_index.metrics, | 122 |
| abstract_inverted_index.multiple | 121 |
| abstract_inverted_index.scoring. | 131 |
| abstract_inverted_index.struggle | 36 |
| abstract_inverted_index.benchmark | 72 |
| abstract_inverted_index.knowledge | 14, 84 |
| abstract_inverted_index.millennia | 6 |
| abstract_inverted_index.AncientDoc | 86 |
| abstract_inverted_index.benchmarks | 45 |
| abstract_inverted_index.challenges | 20 |
| abstract_inverted_index.documents, | 2, 76 |
| abstract_inverted_index.documents. | 63 |
| abstract_inverted_index.evaluating | 58 |
| abstract_inverted_index.invaluable | 3 |
| abstract_inverted_index.linguistic | 41, 98 |
| abstract_inverted_index.mainstream | 118 |
| abstract_inverted_index.reasoning. | 85 |
| abstract_inverted_index.simplified | 52 |
| abstract_inverted_index.vernacular | 92 |
| abstract_inverted_index.(page-level | 90 |
| abstract_inverted_index.AncientDoc, | 69, 115 |
| abstract_inverted_index.complexity. | 42 |
| abstract_inverted_index.traditional | 26 |
| abstract_inverted_index.digitization | 22 |
| abstract_inverted_index.supplemented | 123 |
| abstract_inverted_index.translation, | 93 |
| abstract_inverted_index.human-aligned | 126 |
| abstract_inverted_index.understanding, | 24 |
| abstract_inverted_index.Vision-Language | 33 |
| abstract_inverted_index.knowledge-based | 96 |
| abstract_inverted_index.reasoning-based | 94 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 21 |
| citation_normalized_percentile |