SAIL: Sample-Centric In-Context Learning for Document Information Extraction Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.1609/aaai.v39i24.34780
Document Information Extraction (DIE) aims to extract structured information from Visually Rich Documents (VRDs). Previous full-training approaches have demonstrated strong performance but may struggle with generalization to unseen data. In contrast, training-free methods leverage powerful pre-trained models like Large Language Models (LLMs) to address various downstream tasks with only a few examples. Nonetheless, training-free methods for DIE encounter two primary challenges: (1) understanding the complex relationship between layout and textual elements in VRDs, and (2) providing accurate guidance to pre-trained models. To address these challenges, we propose SAmple-centric In-context Learning (SAIL). SAIL introduces a fine-grained entity-level textual similarity to facilitate in-depth text analysis by LLMs and incorporates layout similarity to enhance the analysis of layouts in VRDs. Moreover, SAIL formulates a unified In-Context Learning (ICL) prompt template for various sample-centric examples, enabling tailored prompts that deliver precise guidance to pre-trained models for each sample. Extensive experiments on FUNSD, CORD, and SROIE benchmarks with various base models (e.g., LLMs) indicate that our SAIL outperforms training-free baselines, even closer to the full-training methods, showing the superiority and generalization of our method.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.1609/aaai.v39i24.34780
- OA Status
- diamond
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4409363075
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4409363075Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1609/aaai.v39i24.34780Digital Object Identifier
- Title
-
SAIL: Sample-Centric In-Context Learning for Document Information ExtractionWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-04-11Full publication date if available
- Authors
-
Jinyu Zhang, Zhiyuan You, Wang Jie, Xinyi LeList of authors in order
- Landing page
-
https://doi.org/10.1609/aaai.v39i24.34780Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
diamondOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.1609/aaai.v39i24.34780Direct OA link when available
- Concepts
-
Context (archaeology), Computer science, Sample (material), Information retrieval, Information extraction, Data science, World Wide Web, Geography, Archaeology, Chemistry, ChromatographyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4409363075 |
|---|---|
| doi | https://doi.org/10.1609/aaai.v39i24.34780 |
| ids.doi | https://doi.org/10.1609/aaai.v39i24.34780 |
| ids.openalex | https://openalex.org/W4409363075 |
| fwci | 0.0 |
| type | article |
| title | SAIL: Sample-Centric In-Context Learning for Document Information Extraction |
| biblio.issue | 24 |
| biblio.volume | 39 |
| biblio.last_page | 25876 |
| biblio.first_page | 25868 |
| topics[0].id | https://openalex.org/T10028 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.8521999716758728 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Topic Modeling |
| topics[1].id | https://openalex.org/T11550 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.8472999930381775 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Text and Document Classification Technologies |
| topics[2].id | https://openalex.org/T10601 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.82669997215271 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Handwritten Text Recognition Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2779343474 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6582761406898499 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q3109175 |
| concepts[0].display_name | Context (archaeology) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5682976245880127 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C198531522 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5389114022254944 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q485146 |
| concepts[2].display_name | Sample (material) |
| concepts[3].id | https://openalex.org/C23123220 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5275344848632812 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[3].display_name | Information retrieval |
| concepts[4].id | https://openalex.org/C195807954 |
| concepts[4].level | 2 |
| concepts[4].score | 0.4370875358581543 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1662562 |
| concepts[4].display_name | Information extraction |
| concepts[5].id | https://openalex.org/C2522767166 |
| concepts[5].level | 1 |
| concepts[5].score | 0.3725430965423584 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q2374463 |
| concepts[5].display_name | Data science |
| concepts[6].id | https://openalex.org/C136764020 |
| concepts[6].level | 1 |
| concepts[6].score | 0.3714311420917511 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q466 |
| concepts[6].display_name | World Wide Web |
| concepts[7].id | https://openalex.org/C205649164 |
| concepts[7].level | 0 |
| concepts[7].score | 0.1717430055141449 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[7].display_name | Geography |
| concepts[8].id | https://openalex.org/C166957645 |
| concepts[8].level | 1 |
| concepts[8].score | 0.07497230172157288 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q23498 |
| concepts[8].display_name | Archaeology |
| concepts[9].id | https://openalex.org/C185592680 |
| concepts[9].level | 0 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[9].display_name | Chemistry |
| concepts[10].id | https://openalex.org/C43617362 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q170050 |
| concepts[10].display_name | Chromatography |
| keywords[0].id | https://openalex.org/keywords/context |
| keywords[0].score | 0.6582761406898499 |
| keywords[0].display_name | Context (archaeology) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.5682976245880127 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/sample |
| keywords[2].score | 0.5389114022254944 |
| keywords[2].display_name | Sample (material) |
| keywords[3].id | https://openalex.org/keywords/information-retrieval |
| keywords[3].score | 0.5275344848632812 |
| keywords[3].display_name | Information retrieval |
| keywords[4].id | https://openalex.org/keywords/information-extraction |
| keywords[4].score | 0.4370875358581543 |
| keywords[4].display_name | Information extraction |
| keywords[5].id | https://openalex.org/keywords/data-science |
| keywords[5].score | 0.3725430965423584 |
| keywords[5].display_name | Data science |
| keywords[6].id | https://openalex.org/keywords/world-wide-web |
| keywords[6].score | 0.3714311420917511 |
| keywords[6].display_name | World Wide Web |
| keywords[7].id | https://openalex.org/keywords/geography |
| keywords[7].score | 0.1717430055141449 |
| keywords[7].display_name | Geography |
| keywords[8].id | https://openalex.org/keywords/archaeology |
| keywords[8].score | 0.07497230172157288 |
| keywords[8].display_name | Archaeology |
| language | en |
| locations[0].id | doi:10.1609/aaai.v39i24.34780 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4210191458 |
| locations[0].source.issn | 2159-5399, 2374-3468 |
| locations[0].source.type | conference |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | 2159-5399 |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| locations[0].source.host_organization | https://openalex.org/P4310320058 |
| locations[0].source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310320058 |
| locations[0].source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| locations[0].license | |
| locations[0].pdf_url | |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| locations[0].landing_page_url | https://doi.org/10.1609/aaai.v39i24.34780 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5100435004 |
| authorships[0].author.orcid | https://orcid.org/0009-0009-7231-3368 |
| authorships[0].author.display_name | Jinyu Zhang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Jinyu Zhang |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5039830847 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Zhiyuan You |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Zhiyuan You |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5101828800 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-9704-3725 |
| authorships[2].author.display_name | Wang Jie |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Jize Wang |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5052481491 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-0318-9497 |
| authorships[3].author.display_name | Xinyi Le |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Xinyi Le |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.1609/aaai.v39i24.34780 |
| open_access.oa_status | diamond |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | SAIL: Sample-Centric In-Context Learning for Document Information Extraction |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10028 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.8521999716758728 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Topic Modeling |
| related_works | https://openalex.org/W1185300216, https://openalex.org/W2954163146, https://openalex.org/W2896057011, https://openalex.org/W4238675884, https://openalex.org/W2589817099, https://openalex.org/W3033465211, https://openalex.org/W1971600963, https://openalex.org/W74450112, https://openalex.org/W4213212078, https://openalex.org/W1017189767 |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1609/aaai.v39i24.34780 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4210191458 |
| best_oa_location.source.issn | 2159-5399, 2374-3468 |
| best_oa_location.source.type | conference |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | 2159-5399 |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| best_oa_location.source.host_organization | https://openalex.org/P4310320058 |
| best_oa_location.source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310320058 |
| best_oa_location.source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| best_oa_location.license | |
| best_oa_location.pdf_url | |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| best_oa_location.landing_page_url | https://doi.org/10.1609/aaai.v39i24.34780 |
| primary_location.id | doi:10.1609/aaai.v39i24.34780 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4210191458 |
| primary_location.source.issn | 2159-5399, 2374-3468 |
| primary_location.source.type | conference |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | 2159-5399 |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| primary_location.source.host_organization | https://openalex.org/P4310320058 |
| primary_location.source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310320058 |
| primary_location.source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| primary_location.license | |
| primary_location.pdf_url | |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| primary_location.landing_page_url | https://doi.org/10.1609/aaai.v39i24.34780 |
| publication_date | 2025-04-11 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 49, 93, 120 |
| abstract_inverted_index.In | 29 |
| abstract_inverted_index.To | 81 |
| abstract_inverted_index.by | 103 |
| abstract_inverted_index.in | 71, 115 |
| abstract_inverted_index.of | 113, 176 |
| abstract_inverted_index.on | 146 |
| abstract_inverted_index.to | 5, 26, 42, 78, 98, 109, 138, 167 |
| abstract_inverted_index.we | 85 |
| abstract_inverted_index.(1) | 61 |
| abstract_inverted_index.(2) | 74 |
| abstract_inverted_index.DIE | 56 |
| abstract_inverted_index.and | 68, 73, 105, 149, 174 |
| abstract_inverted_index.but | 21 |
| abstract_inverted_index.few | 50 |
| abstract_inverted_index.for | 55, 127, 141 |
| abstract_inverted_index.may | 22 |
| abstract_inverted_index.our | 160, 177 |
| abstract_inverted_index.the | 63, 111, 168, 172 |
| abstract_inverted_index.two | 58 |
| abstract_inverted_index.LLMs | 104 |
| abstract_inverted_index.Rich | 11 |
| abstract_inverted_index.SAIL | 91, 118, 161 |
| abstract_inverted_index.aims | 4 |
| abstract_inverted_index.base | 154 |
| abstract_inverted_index.each | 142 |
| abstract_inverted_index.even | 165 |
| abstract_inverted_index.from | 9 |
| abstract_inverted_index.have | 17 |
| abstract_inverted_index.like | 37 |
| abstract_inverted_index.only | 48 |
| abstract_inverted_index.text | 101 |
| abstract_inverted_index.that | 134, 159 |
| abstract_inverted_index.with | 24, 47, 152 |
| abstract_inverted_index.(DIE) | 3 |
| abstract_inverted_index.(ICL) | 124 |
| abstract_inverted_index.CORD, | 148 |
| abstract_inverted_index.LLMs) | 157 |
| abstract_inverted_index.Large | 38 |
| abstract_inverted_index.SROIE | 150 |
| abstract_inverted_index.VRDs, | 72 |
| abstract_inverted_index.VRDs. | 116 |
| abstract_inverted_index.data. | 28 |
| abstract_inverted_index.tasks | 46 |
| abstract_inverted_index.these | 83 |
| abstract_inverted_index.(LLMs) | 41 |
| abstract_inverted_index.(e.g., | 156 |
| abstract_inverted_index.FUNSD, | 147 |
| abstract_inverted_index.Models | 40 |
| abstract_inverted_index.closer | 166 |
| abstract_inverted_index.layout | 67, 107 |
| abstract_inverted_index.models | 36, 140, 155 |
| abstract_inverted_index.prompt | 125 |
| abstract_inverted_index.strong | 19 |
| abstract_inverted_index.unseen | 27 |
| abstract_inverted_index.(SAIL). | 90 |
| abstract_inverted_index.(VRDs). | 13 |
| abstract_inverted_index.address | 43, 82 |
| abstract_inverted_index.between | 66 |
| abstract_inverted_index.complex | 64 |
| abstract_inverted_index.deliver | 135 |
| abstract_inverted_index.enhance | 110 |
| abstract_inverted_index.extract | 6 |
| abstract_inverted_index.layouts | 114 |
| abstract_inverted_index.method. | 178 |
| abstract_inverted_index.methods | 32, 54 |
| abstract_inverted_index.models. | 80 |
| abstract_inverted_index.precise | 136 |
| abstract_inverted_index.primary | 59 |
| abstract_inverted_index.prompts | 133 |
| abstract_inverted_index.propose | 86 |
| abstract_inverted_index.sample. | 143 |
| abstract_inverted_index.showing | 171 |
| abstract_inverted_index.textual | 69, 96 |
| abstract_inverted_index.unified | 121 |
| abstract_inverted_index.various | 44, 128, 153 |
| abstract_inverted_index.Document | 0 |
| abstract_inverted_index.Language | 39 |
| abstract_inverted_index.Learning | 89, 123 |
| abstract_inverted_index.Previous | 14 |
| abstract_inverted_index.Visually | 10 |
| abstract_inverted_index.accurate | 76 |
| abstract_inverted_index.analysis | 102, 112 |
| abstract_inverted_index.elements | 70 |
| abstract_inverted_index.enabling | 131 |
| abstract_inverted_index.guidance | 77, 137 |
| abstract_inverted_index.in-depth | 100 |
| abstract_inverted_index.indicate | 158 |
| abstract_inverted_index.leverage | 33 |
| abstract_inverted_index.methods, | 170 |
| abstract_inverted_index.powerful | 34 |
| abstract_inverted_index.struggle | 23 |
| abstract_inverted_index.tailored | 132 |
| abstract_inverted_index.template | 126 |
| abstract_inverted_index.Documents | 12 |
| abstract_inverted_index.Extensive | 144 |
| abstract_inverted_index.Moreover, | 117 |
| abstract_inverted_index.contrast, | 30 |
| abstract_inverted_index.encounter | 57 |
| abstract_inverted_index.examples, | 130 |
| abstract_inverted_index.examples. | 51 |
| abstract_inverted_index.providing | 75 |
| abstract_inverted_index.Extraction | 2 |
| abstract_inverted_index.In-Context | 122 |
| abstract_inverted_index.In-context | 88 |
| abstract_inverted_index.approaches | 16 |
| abstract_inverted_index.baselines, | 164 |
| abstract_inverted_index.benchmarks | 151 |
| abstract_inverted_index.downstream | 45 |
| abstract_inverted_index.facilitate | 99 |
| abstract_inverted_index.formulates | 119 |
| abstract_inverted_index.introduces | 92 |
| abstract_inverted_index.similarity | 97, 108 |
| abstract_inverted_index.structured | 7 |
| abstract_inverted_index.Information | 1 |
| abstract_inverted_index.challenges, | 84 |
| abstract_inverted_index.challenges: | 60 |
| abstract_inverted_index.experiments | 145 |
| abstract_inverted_index.information | 8 |
| abstract_inverted_index.outperforms | 162 |
| abstract_inverted_index.performance | 20 |
| abstract_inverted_index.pre-trained | 35, 79, 139 |
| abstract_inverted_index.superiority | 173 |
| abstract_inverted_index.Nonetheless, | 52 |
| abstract_inverted_index.demonstrated | 18 |
| abstract_inverted_index.entity-level | 95 |
| abstract_inverted_index.fine-grained | 94 |
| abstract_inverted_index.incorporates | 106 |
| abstract_inverted_index.relationship | 65 |
| abstract_inverted_index.full-training | 15, 169 |
| abstract_inverted_index.training-free | 31, 53, 163 |
| abstract_inverted_index.understanding | 62 |
| abstract_inverted_index.SAmple-centric | 87 |
| abstract_inverted_index.generalization | 25, 175 |
| abstract_inverted_index.sample-centric | 129 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile.value | 0.22160804 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |