Advancing Offline Handwritten Text Recognition: A Systematic Review of Data Augmentation and Generation Techniques Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2507.06275
Offline Handwritten Text Recognition (HTR) systems play a crucial role in applications such as historical document digitization, automatic form processing, and biometric authentication. However, their performance is often hindered by the limited availability of annotated training data, particularly for low-resource languages and complex scripts. This paper presents a comprehensive survey of offline handwritten data augmentation and generation techniques designed to improve the accuracy and robustness of HTR systems. We systematically examine traditional augmentation methods alongside recent advances in deep learning, including Generative Adversarial Networks (GANs), diffusion models, and transformer-based approaches. Furthermore, we explore the challenges associated with generating diverse and realistic handwriting samples, particularly in preserving script authenticity and addressing data scarcity. This survey follows the PRISMA methodology, ensuring a structured and rigorous selection process. Our analysis began with 1,302 primary studies, which were filtered down to 848 after removing duplicates, drawing from key academic sources such as IEEE Digital Library, Springer Link, Science Direct, and ACM Digital Library. By evaluating existing datasets, assessment metrics, and state-of-the-art methodologies, this survey identifies key research gaps and proposes future directions to advance the field of handwritten text generation across diverse linguistic and stylistic landscapes.
Related Topics
- Type
- review
- Language
- en
- Landing Page
- http://arxiv.org/abs/2507.06275
- https://arxiv.org/pdf/2507.06275
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416101313
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416101313Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2507.06275Digital Object Identifier
- Title
-
Advancing Offline Handwritten Text Recognition: A Systematic Review of Data Augmentation and Generation TechniquesWork title
- Type
-
reviewOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-07-08Full publication date if available
- Authors
-
Yassin Hussein Rassul, Aram M. Ahmed, Polla Fattah, Bryar A. Hassan, Ademola Abdulkareem, Tarik A. Rashid, Joan LuList of authors in order
- Landing page
-
https://arxiv.org/abs/2507.06275Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2507.06275Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2507.06275Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416101313 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2507.06275 |
| ids.doi | https://doi.org/10.48550/arxiv.2507.06275 |
| ids.openalex | https://openalex.org/W4416101313 |
| fwci | |
| type | review |
| title | Advancing Offline Handwritten Text Recognition: A Systematic Review of Data Augmentation and Generation Techniques |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2507.06275 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2507.06275 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2507.06275 |
| locations[1].id | doi:10.48550/arxiv.2507.06275 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2507.06275 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5114424925 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Yassin Hussein Rassul |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Rassul, Yassin Hussein |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5085426579 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-5640-0172 |
| authorships[1].author.display_name | Aram M. Ahmed |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ahmed, Aram M. |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5023793448 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-8027-3540 |
| authorships[2].author.display_name | Polla Fattah |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Fattah, Polla |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5052779066 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-4476-9351 |
| authorships[3].author.display_name | Bryar A. Hassan |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Hassan, Bryar A. |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5002577537 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-7939-3761 |
| authorships[4].author.display_name | Ademola Abdulkareem |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Abdulkareem, Arwaa W. |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5065801825 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-8661-258X |
| authorships[5].author.display_name | Tarik A. Rashid |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Rashid, Tarik A. |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5002204703 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-0585-2806 |
| authorships[6].author.display_name | Joan Lu |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Lu, Joan |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2507.06275 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Advancing Offline Handwritten Text Recognition: A Systematic Review of Data Augmentation and Generation Techniques |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T08:54:43.696151 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2507.06275 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2507.06275 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2507.06275 |
| primary_location.id | pmh:oai:arXiv.org:2507.06275 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2507.06275 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2507.06275 |
| publication_date | 2025-07-08 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 7, 47, 119 |
| abstract_inverted_index.By | 159 |
| abstract_inverted_index.We | 68 |
| abstract_inverted_index.as | 13, 147 |
| abstract_inverted_index.by | 29 |
| abstract_inverted_index.in | 10, 77, 104 |
| abstract_inverted_index.is | 26 |
| abstract_inverted_index.of | 33, 50, 65, 182 |
| abstract_inverted_index.to | 59, 136, 178 |
| abstract_inverted_index.we | 91 |
| abstract_inverted_index.848 | 137 |
| abstract_inverted_index.ACM | 156 |
| abstract_inverted_index.HTR | 66 |
| abstract_inverted_index.Our | 125 |
| abstract_inverted_index.and | 20, 41, 55, 63, 87, 99, 108, 121, 155, 165, 174, 189 |
| abstract_inverted_index.for | 38 |
| abstract_inverted_index.key | 143, 171 |
| abstract_inverted_index.the | 30, 61, 93, 115, 180 |
| abstract_inverted_index.IEEE | 148 |
| abstract_inverted_index.Text | 2 |
| abstract_inverted_index.This | 44, 112 |
| abstract_inverted_index.data | 53, 110 |
| abstract_inverted_index.deep | 78 |
| abstract_inverted_index.down | 135 |
| abstract_inverted_index.form | 18 |
| abstract_inverted_index.from | 142 |
| abstract_inverted_index.gaps | 173 |
| abstract_inverted_index.play | 6 |
| abstract_inverted_index.role | 9 |
| abstract_inverted_index.such | 12, 146 |
| abstract_inverted_index.text | 184 |
| abstract_inverted_index.this | 168 |
| abstract_inverted_index.were | 133 |
| abstract_inverted_index.with | 96, 128 |
| abstract_inverted_index.(HTR) | 4 |
| abstract_inverted_index.1,302 | 129 |
| abstract_inverted_index.Link, | 152 |
| abstract_inverted_index.after | 138 |
| abstract_inverted_index.began | 127 |
| abstract_inverted_index.data, | 36 |
| abstract_inverted_index.field | 181 |
| abstract_inverted_index.often | 27 |
| abstract_inverted_index.paper | 45 |
| abstract_inverted_index.their | 24 |
| abstract_inverted_index.which | 132 |
| abstract_inverted_index.PRISMA | 116 |
| abstract_inverted_index.across | 186 |
| abstract_inverted_index.future | 176 |
| abstract_inverted_index.recent | 75 |
| abstract_inverted_index.script | 106 |
| abstract_inverted_index.survey | 49, 113, 169 |
| abstract_inverted_index.(GANs), | 84 |
| abstract_inverted_index.Digital | 149, 157 |
| abstract_inverted_index.Direct, | 154 |
| abstract_inverted_index.Offline | 0 |
| abstract_inverted_index.Science | 153 |
| abstract_inverted_index.advance | 179 |
| abstract_inverted_index.complex | 42 |
| abstract_inverted_index.crucial | 8 |
| abstract_inverted_index.diverse | 98, 187 |
| abstract_inverted_index.drawing | 141 |
| abstract_inverted_index.examine | 70 |
| abstract_inverted_index.explore | 92 |
| abstract_inverted_index.follows | 114 |
| abstract_inverted_index.improve | 60 |
| abstract_inverted_index.limited | 31 |
| abstract_inverted_index.methods | 73 |
| abstract_inverted_index.models, | 86 |
| abstract_inverted_index.offline | 51 |
| abstract_inverted_index.primary | 130 |
| abstract_inverted_index.sources | 145 |
| abstract_inverted_index.systems | 5 |
| abstract_inverted_index.However, | 23 |
| abstract_inverted_index.Library, | 150 |
| abstract_inverted_index.Library. | 158 |
| abstract_inverted_index.Networks | 83 |
| abstract_inverted_index.Springer | 151 |
| abstract_inverted_index.academic | 144 |
| abstract_inverted_index.accuracy | 62 |
| abstract_inverted_index.advances | 76 |
| abstract_inverted_index.analysis | 126 |
| abstract_inverted_index.designed | 58 |
| abstract_inverted_index.document | 15 |
| abstract_inverted_index.ensuring | 118 |
| abstract_inverted_index.existing | 161 |
| abstract_inverted_index.filtered | 134 |
| abstract_inverted_index.hindered | 28 |
| abstract_inverted_index.metrics, | 164 |
| abstract_inverted_index.presents | 46 |
| abstract_inverted_index.process. | 124 |
| abstract_inverted_index.proposes | 175 |
| abstract_inverted_index.removing | 139 |
| abstract_inverted_index.research | 172 |
| abstract_inverted_index.rigorous | 122 |
| abstract_inverted_index.samples, | 102 |
| abstract_inverted_index.scripts. | 43 |
| abstract_inverted_index.studies, | 131 |
| abstract_inverted_index.systems. | 67 |
| abstract_inverted_index.training | 35 |
| abstract_inverted_index.alongside | 74 |
| abstract_inverted_index.annotated | 34 |
| abstract_inverted_index.automatic | 17 |
| abstract_inverted_index.biometric | 21 |
| abstract_inverted_index.datasets, | 162 |
| abstract_inverted_index.diffusion | 85 |
| abstract_inverted_index.including | 80 |
| abstract_inverted_index.languages | 40 |
| abstract_inverted_index.learning, | 79 |
| abstract_inverted_index.realistic | 100 |
| abstract_inverted_index.scarcity. | 111 |
| abstract_inverted_index.selection | 123 |
| abstract_inverted_index.stylistic | 190 |
| abstract_inverted_index.Generative | 81 |
| abstract_inverted_index.addressing | 109 |
| abstract_inverted_index.assessment | 163 |
| abstract_inverted_index.associated | 95 |
| abstract_inverted_index.challenges | 94 |
| abstract_inverted_index.directions | 177 |
| abstract_inverted_index.evaluating | 160 |
| abstract_inverted_index.generating | 97 |
| abstract_inverted_index.generation | 56, 185 |
| abstract_inverted_index.historical | 14 |
| abstract_inverted_index.identifies | 170 |
| abstract_inverted_index.linguistic | 188 |
| abstract_inverted_index.preserving | 105 |
| abstract_inverted_index.robustness | 64 |
| abstract_inverted_index.structured | 120 |
| abstract_inverted_index.techniques | 57 |
| abstract_inverted_index.Adversarial | 82 |
| abstract_inverted_index.Handwritten | 1 |
| abstract_inverted_index.Recognition | 3 |
| abstract_inverted_index.approaches. | 89 |
| abstract_inverted_index.duplicates, | 140 |
| abstract_inverted_index.handwriting | 101 |
| abstract_inverted_index.handwritten | 52, 183 |
| abstract_inverted_index.landscapes. | 191 |
| abstract_inverted_index.performance | 25 |
| abstract_inverted_index.processing, | 19 |
| abstract_inverted_index.traditional | 71 |
| abstract_inverted_index.Furthermore, | 90 |
| abstract_inverted_index.applications | 11 |
| abstract_inverted_index.augmentation | 54, 72 |
| abstract_inverted_index.authenticity | 107 |
| abstract_inverted_index.availability | 32 |
| abstract_inverted_index.low-resource | 39 |
| abstract_inverted_index.methodology, | 117 |
| abstract_inverted_index.particularly | 37, 103 |
| abstract_inverted_index.comprehensive | 48 |
| abstract_inverted_index.digitization, | 16 |
| abstract_inverted_index.methodologies, | 167 |
| abstract_inverted_index.systematically | 69 |
| abstract_inverted_index.authentication. | 22 |
| abstract_inverted_index.state-of-the-art | 166 |
| abstract_inverted_index.transformer-based | 88 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |