Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2505.14059
Document image parsing is challenging due to its complexly intertwined elements such as text paragraphs, figures, formulas, and tables. Current approaches either assemble specialized expert models or directly generate page-level content autoregressively, facing integration overhead, efficiency bottlenecks, and layout structure degradation despite their decent performance. To address these limitations, we present \textit{Dolphin} (\textit{\textbf{Do}cument Image \textbf{P}arsing via \textbf{H}eterogeneous Anchor Prompt\textbf{in}g}), a novel multimodal document image parsing model following an analyze-then-parse paradigm. In the first stage, Dolphin generates a sequence of layout elements in reading order. These heterogeneous elements, serving as anchors and coupled with task-specific prompts, are fed back to Dolphin for parallel content parsing in the second stage. To train Dolphin, we construct a large-scale dataset of over 30 million samples, covering multi-granularity parsing tasks. Through comprehensive evaluations on both prevalent benchmarks and self-constructed ones, Dolphin achieves state-of-the-art performance across diverse page-level and element-level settings, while ensuring superior efficiency through its lightweight architecture and parallel parsing mechanism. The code and pre-trained models are publicly available at https://github.com/ByteDance/Dolphin
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2505.14059
- https://arxiv.org/pdf/2505.14059
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4417298651
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4417298651Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2505.14059Digital Object Identifier
- Title
-
Dolphin: Document Image Parsing via Heterogeneous Anchor PromptingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-05-20Full publication date if available
- Authors
-
Hao Feng, Wei Shu, Xiang Fei, Wei Shi, Yingdong Han, Lei Liao, Jinghui Lu, Binghong Wu, Qi Liu, Chih‐Yu Lin, Jingqun Tang, Hao Liu, Can HuangList of authors in order
- Landing page
-
https://arxiv.org/abs/2505.14059Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2505.14059Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2505.14059Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4417298651 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2505.14059 |
| ids.doi | https://doi.org/10.48550/arxiv.2505.14059 |
| ids.openalex | https://openalex.org/W4417298651 |
| fwci | |
| type | preprint |
| title | Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2505.14059 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2505.14059 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2505.14059 |
| locations[1].id | doi:10.48550/arxiv.2505.14059 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2505.14059 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5005177314 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-0899-0438 |
| authorships[0].author.display_name | Hao Feng |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Feng, Hao |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5030583530 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-0890-2634 |
| authorships[1].author.display_name | Wei Shu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wei, Shu |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5100660005 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-8271-1815 |
| authorships[2].author.display_name | Xiang Fei |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Fei, Xiang |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5108434351 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-3554-0202 |
| authorships[3].author.display_name | Wei Shi |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Shi, Wei |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5049863437 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-1093-1381 |
| authorships[4].author.display_name | Yingdong Han |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Han, Yingdong |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5036022527 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-1325-2410 |
| authorships[5].author.display_name | Lei Liao |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Liao, Lei |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5050212540 |
| authorships[6].author.orcid | https://orcid.org/0000-0001-7149-6961 |
| authorships[6].author.display_name | Jinghui Lu |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Lu, Jinghui |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5036758849 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-3361-2260 |
| authorships[7].author.display_name | Binghong Wu |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Wu, Binghong |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5100393403 |
| authorships[8].author.orcid | https://orcid.org/0009-0000-4144-938X |
| authorships[8].author.display_name | Qi Liu |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Liu, Qi |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5101652889 |
| authorships[9].author.orcid | https://orcid.org/0000-0003-4661-4131 |
| authorships[9].author.display_name | Chih‐Yu Lin |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Lin, Chunhui |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5100681719 |
| authorships[10].author.orcid | https://orcid.org/0000-0003-2577-0119 |
| authorships[10].author.display_name | Jingqun Tang |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Tang, Jingqun |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5100458818 |
| authorships[11].author.orcid | https://orcid.org/0000-0002-5248-935X |
| authorships[11].author.display_name | Hao Liu |
| authorships[11].author_position | middle |
| authorships[11].raw_author_name | Liu, Hao |
| authorships[11].is_corresponding | False |
| authorships[12].author.id | https://openalex.org/A5035977616 |
| authorships[12].author.orcid | https://orcid.org/0000-0001-7395-2270 |
| authorships[12].author.display_name | Can Huang |
| authorships[12].author_position | last |
| authorships[12].raw_author_name | Huang, Can |
| authorships[12].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2505.14059 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-12-13T14:46:20.555162 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2505.14059 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2505.14059 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2505.14059 |
| primary_location.id | pmh:oai:arXiv.org:2505.14059 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2505.14059 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2505.14059 |
| publication_date | 2025-05-20 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 59, 76, 113 |
| abstract_inverted_index.30 | 118 |
| abstract_inverted_index.In | 70 |
| abstract_inverted_index.To | 45, 108 |
| abstract_inverted_index.an | 67 |
| abstract_inverted_index.as | 12, 88 |
| abstract_inverted_index.at | 165 |
| abstract_inverted_index.in | 81, 104 |
| abstract_inverted_index.is | 3 |
| abstract_inverted_index.of | 78, 116 |
| abstract_inverted_index.on | 128 |
| abstract_inverted_index.or | 26 |
| abstract_inverted_index.to | 6, 98 |
| abstract_inverted_index.we | 49, 111 |
| abstract_inverted_index.The | 157 |
| abstract_inverted_index.and | 17, 37, 90, 132, 142, 153, 159 |
| abstract_inverted_index.are | 95, 162 |
| abstract_inverted_index.due | 5 |
| abstract_inverted_index.fed | 96 |
| abstract_inverted_index.for | 100 |
| abstract_inverted_index.its | 7, 150 |
| abstract_inverted_index.the | 71, 105 |
| abstract_inverted_index.via | 55 |
| abstract_inverted_index.back | 97 |
| abstract_inverted_index.both | 129 |
| abstract_inverted_index.code | 158 |
| abstract_inverted_index.over | 117 |
| abstract_inverted_index.such | 11 |
| abstract_inverted_index.text | 13 |
| abstract_inverted_index.with | 92 |
| abstract_inverted_index.Image | 53 |
| abstract_inverted_index.These | 84 |
| abstract_inverted_index.first | 72 |
| abstract_inverted_index.image | 1, 63 |
| abstract_inverted_index.model | 65 |
| abstract_inverted_index.novel | 60 |
| abstract_inverted_index.ones, | 134 |
| abstract_inverted_index.their | 42 |
| abstract_inverted_index.these | 47 |
| abstract_inverted_index.train | 109 |
| abstract_inverted_index.while | 145 |
| abstract_inverted_index.Anchor | 57 |
| abstract_inverted_index.across | 139 |
| abstract_inverted_index.decent | 43 |
| abstract_inverted_index.either | 21 |
| abstract_inverted_index.expert | 24 |
| abstract_inverted_index.facing | 32 |
| abstract_inverted_index.layout | 38, 79 |
| abstract_inverted_index.models | 25, 161 |
| abstract_inverted_index.order. | 83 |
| abstract_inverted_index.second | 106 |
| abstract_inverted_index.stage, | 73 |
| abstract_inverted_index.stage. | 107 |
| abstract_inverted_index.tasks. | 124 |
| abstract_inverted_index.Current | 19 |
| abstract_inverted_index.Dolphin | 74, 99, 135 |
| abstract_inverted_index.Through | 125 |
| abstract_inverted_index.address | 46 |
| abstract_inverted_index.anchors | 89 |
| abstract_inverted_index.content | 30, 102 |
| abstract_inverted_index.coupled | 91 |
| abstract_inverted_index.dataset | 115 |
| abstract_inverted_index.despite | 41 |
| abstract_inverted_index.diverse | 140 |
| abstract_inverted_index.million | 119 |
| abstract_inverted_index.parsing | 2, 64, 103, 123, 155 |
| abstract_inverted_index.present | 50 |
| abstract_inverted_index.reading | 82 |
| abstract_inverted_index.serving | 87 |
| abstract_inverted_index.tables. | 18 |
| abstract_inverted_index.through | 149 |
| abstract_inverted_index.Document | 0 |
| abstract_inverted_index.Dolphin, | 110 |
| abstract_inverted_index.achieves | 136 |
| abstract_inverted_index.assemble | 22 |
| abstract_inverted_index.covering | 121 |
| abstract_inverted_index.directly | 27 |
| abstract_inverted_index.document | 62 |
| abstract_inverted_index.elements | 10, 80 |
| abstract_inverted_index.ensuring | 146 |
| abstract_inverted_index.figures, | 15 |
| abstract_inverted_index.generate | 28 |
| abstract_inverted_index.parallel | 101, 154 |
| abstract_inverted_index.prompts, | 94 |
| abstract_inverted_index.publicly | 163 |
| abstract_inverted_index.samples, | 120 |
| abstract_inverted_index.sequence | 77 |
| abstract_inverted_index.superior | 147 |
| abstract_inverted_index.available | 164 |
| abstract_inverted_index.complexly | 8 |
| abstract_inverted_index.construct | 112 |
| abstract_inverted_index.elements, | 86 |
| abstract_inverted_index.following | 66 |
| abstract_inverted_index.formulas, | 16 |
| abstract_inverted_index.generates | 75 |
| abstract_inverted_index.overhead, | 34 |
| abstract_inverted_index.paradigm. | 69 |
| abstract_inverted_index.prevalent | 130 |
| abstract_inverted_index.settings, | 144 |
| abstract_inverted_index.structure | 39 |
| abstract_inverted_index.approaches | 20 |
| abstract_inverted_index.benchmarks | 131 |
| abstract_inverted_index.efficiency | 35, 148 |
| abstract_inverted_index.mechanism. | 156 |
| abstract_inverted_index.multimodal | 61 |
| abstract_inverted_index.page-level | 29, 141 |
| abstract_inverted_index.challenging | 4 |
| abstract_inverted_index.degradation | 40 |
| abstract_inverted_index.evaluations | 127 |
| abstract_inverted_index.integration | 33 |
| abstract_inverted_index.intertwined | 9 |
| abstract_inverted_index.large-scale | 114 |
| abstract_inverted_index.lightweight | 151 |
| abstract_inverted_index.paragraphs, | 14 |
| abstract_inverted_index.performance | 138 |
| abstract_inverted_index.pre-trained | 160 |
| abstract_inverted_index.specialized | 23 |
| abstract_inverted_index.architecture | 152 |
| abstract_inverted_index.bottlenecks, | 36 |
| abstract_inverted_index.limitations, | 48 |
| abstract_inverted_index.performance. | 44 |
| abstract_inverted_index.comprehensive | 126 |
| abstract_inverted_index.element-level | 143 |
| abstract_inverted_index.heterogeneous | 85 |
| abstract_inverted_index.task-specific | 93 |
| abstract_inverted_index.\textbf{P}arsing | 54 |
| abstract_inverted_index.\textit{Dolphin} | 51 |
| abstract_inverted_index.self-constructed | 133 |
| abstract_inverted_index.state-of-the-art | 137 |
| abstract_inverted_index.autoregressively, | 31 |
| abstract_inverted_index.multi-granularity | 122 |
| abstract_inverted_index.analyze-then-parse | 68 |
| abstract_inverted_index.Prompt\textbf{in}g}), | 58 |
| abstract_inverted_index.\textbf{H}eterogeneous | 56 |
| abstract_inverted_index.(\textit{\textbf{Do}cument | 52 |
| abstract_inverted_index.https://github.com/ByteDance/Dolphin | 166 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 13 |
| citation_normalized_percentile |