TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2412.01137
Scene text recognition (STR) suffers from challenges of either less realistic synthetic training data or the difficulty of collecting sufficient high-quality real-world data, limiting the effectiveness of trained models. Meanwhile, despite producing holistically appealing text images, diffusion-based visual text generation methods struggle to synthesize accurate and realistic instance-level text at scale. To tackle this, we introduce TextSSR: a novel pipeline for Synthesizing Scene Text Recognition training data. TextSSR targets three key synthesizing characteristics: accuracy, realism, and scalability. It achieves accuracy through a proposed region-centric text generation with position-glyph enhancement, ensuring proper character placement. It maintains realism by guiding style and appearance generation using contextual hints from surrounding text or background. This character-aware diffusion architecture enjoys precise character-level control and semantic coherence preservation, without relying on natural language prompts. Therefore, TextSSR supports large-scale generation through combinatorial text permutations. Based on these, we present TextSSR-F, a dataset of 3.55 million quality-screened text instances. Extensive experiments show that STR models trained on TextSSR-F outperform those trained on existing synthetic datasets by clear margins on common benchmarks, and further improvements are observed when mixed with real-world training data. Code is available at https://github.com/YesianRohn/TextSSR.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2412.01137
- https://arxiv.org/pdf/2412.01137
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4405033944
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4405033944Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2412.01137Digital Object Identifier
- Title
-
TextSSR: Diffusion-based Data Synthesis for Scene Text RecognitionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-12-02Full publication date if available
- Authors
-
Xiaoyun Ye, Yongkun Du, Yunbo Tao, Zhineng ChenList of authors in order
- Landing page
-
https://arxiv.org/abs/2412.01137Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2412.01137Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2412.01137Direct OA link when available
- Concepts
-
Diffusion, Computer science, Artificial intelligence, Pattern recognition (psychology), Data science, Physics, ThermodynamicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4405033944 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2412.01137 |
| ids.doi | https://doi.org/10.48550/arxiv.2412.01137 |
| ids.openalex | https://openalex.org/W4405033944 |
| fwci | |
| type | preprint |
| title | TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10601 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9707000255584717 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Handwritten Text Recognition Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C69357855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6412297487258911 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q163214 |
| concepts[0].display_name | Diffusion |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5409802198410034 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.364722341299057 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C153180895 |
| concepts[3].level | 2 |
| concepts[3].score | 0.33931028842926025 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[3].display_name | Pattern recognition (psychology) |
| concepts[4].id | https://openalex.org/C2522767166 |
| concepts[4].level | 1 |
| concepts[4].score | 0.33169493079185486 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q2374463 |
| concepts[4].display_name | Data science |
| concepts[5].id | https://openalex.org/C121332964 |
| concepts[5].level | 0 |
| concepts[5].score | 0.09350809454917908 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[5].display_name | Physics |
| concepts[6].id | https://openalex.org/C97355855 |
| concepts[6].level | 1 |
| concepts[6].score | 0.0 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11473 |
| concepts[6].display_name | Thermodynamics |
| keywords[0].id | https://openalex.org/keywords/diffusion |
| keywords[0].score | 0.6412297487258911 |
| keywords[0].display_name | Diffusion |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.5409802198410034 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.364722341299057 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/pattern-recognition |
| keywords[3].score | 0.33931028842926025 |
| keywords[3].display_name | Pattern recognition (psychology) |
| keywords[4].id | https://openalex.org/keywords/data-science |
| keywords[4].score | 0.33169493079185486 |
| keywords[4].display_name | Data science |
| keywords[5].id | https://openalex.org/keywords/physics |
| keywords[5].score | 0.09350809454917908 |
| keywords[5].display_name | Physics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2412.01137 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2412.01137 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2412.01137 |
| locations[1].id | doi:10.48550/arxiv.2412.01137 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2412.01137 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5061265309 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-7465-9882 |
| authorships[0].author.display_name | Xiaoyun Ye |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ye, Xingsong |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5102915298 |
| authorships[1].author.orcid | https://orcid.org/0009-0000-9859-721X |
| authorships[1].author.display_name | Yongkun Du |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Du, Yongkun |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5090936749 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-5850-3159 |
| authorships[2].author.display_name | Yunbo Tao |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Tao, Yunbo |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5080463909 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-1543-6889 |
| authorships[3].author.display_name | Zhineng Chen |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Chen, Zhineng |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2412.01137 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10601 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9707000255584717 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Handwritten Text Recognition Techniques |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2033914206, https://openalex.org/W2042327336 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2412.01137 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2412.01137 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2412.01137 |
| primary_location.id | pmh:oai:arXiv.org:2412.01137 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2412.01137 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2412.01137 |
| publication_date | 2024-12-02 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 57, 81, 143 |
| abstract_inverted_index.It | 77, 93 |
| abstract_inverted_index.To | 51 |
| abstract_inverted_index.at | 49, 187 |
| abstract_inverted_index.by | 96, 167 |
| abstract_inverted_index.is | 185 |
| abstract_inverted_index.of | 7, 17, 26, 145 |
| abstract_inverted_index.on | 124, 138, 158, 163, 170 |
| abstract_inverted_index.or | 14, 108 |
| abstract_inverted_index.to | 42 |
| abstract_inverted_index.we | 54, 140 |
| abstract_inverted_index.STR | 155 |
| abstract_inverted_index.and | 45, 75, 99, 118, 173 |
| abstract_inverted_index.are | 176 |
| abstract_inverted_index.for | 60 |
| abstract_inverted_index.key | 70 |
| abstract_inverted_index.the | 15, 24 |
| abstract_inverted_index.3.55 | 146 |
| abstract_inverted_index.Code | 184 |
| abstract_inverted_index.Text | 63 |
| abstract_inverted_index.This | 110 |
| abstract_inverted_index.data | 13 |
| abstract_inverted_index.from | 5, 105 |
| abstract_inverted_index.less | 9 |
| abstract_inverted_index.show | 153 |
| abstract_inverted_index.text | 1, 34, 38, 48, 84, 107, 135, 149 |
| abstract_inverted_index.that | 154 |
| abstract_inverted_index.when | 178 |
| abstract_inverted_index.with | 86, 180 |
| abstract_inverted_index.(STR) | 3 |
| abstract_inverted_index.Based | 137 |
| abstract_inverted_index.Scene | 0, 62 |
| abstract_inverted_index.clear | 168 |
| abstract_inverted_index.data, | 22 |
| abstract_inverted_index.data. | 66, 183 |
| abstract_inverted_index.hints | 104 |
| abstract_inverted_index.mixed | 179 |
| abstract_inverted_index.novel | 58 |
| abstract_inverted_index.style | 98 |
| abstract_inverted_index.this, | 53 |
| abstract_inverted_index.those | 161 |
| abstract_inverted_index.three | 69 |
| abstract_inverted_index.using | 102 |
| abstract_inverted_index.common | 171 |
| abstract_inverted_index.either | 8 |
| abstract_inverted_index.enjoys | 114 |
| abstract_inverted_index.models | 156 |
| abstract_inverted_index.proper | 90 |
| abstract_inverted_index.scale. | 50 |
| abstract_inverted_index.tackle | 52 |
| abstract_inverted_index.these, | 139 |
| abstract_inverted_index.visual | 37 |
| abstract_inverted_index.TextSSR | 67, 129 |
| abstract_inverted_index.control | 117 |
| abstract_inverted_index.dataset | 144 |
| abstract_inverted_index.despite | 30 |
| abstract_inverted_index.further | 174 |
| abstract_inverted_index.guiding | 97 |
| abstract_inverted_index.images, | 35 |
| abstract_inverted_index.margins | 169 |
| abstract_inverted_index.methods | 40 |
| abstract_inverted_index.million | 147 |
| abstract_inverted_index.models. | 28 |
| abstract_inverted_index.natural | 125 |
| abstract_inverted_index.precise | 115 |
| abstract_inverted_index.present | 141 |
| abstract_inverted_index.realism | 95 |
| abstract_inverted_index.relying | 123 |
| abstract_inverted_index.suffers | 4 |
| abstract_inverted_index.targets | 68 |
| abstract_inverted_index.through | 80, 133 |
| abstract_inverted_index.trained | 27, 157, 162 |
| abstract_inverted_index.without | 122 |
| abstract_inverted_index.TextSSR: | 56 |
| abstract_inverted_index.accuracy | 79 |
| abstract_inverted_index.accurate | 44 |
| abstract_inverted_index.achieves | 78 |
| abstract_inverted_index.datasets | 166 |
| abstract_inverted_index.ensuring | 89 |
| abstract_inverted_index.existing | 164 |
| abstract_inverted_index.language | 126 |
| abstract_inverted_index.limiting | 23 |
| abstract_inverted_index.observed | 177 |
| abstract_inverted_index.pipeline | 59 |
| abstract_inverted_index.prompts. | 127 |
| abstract_inverted_index.proposed | 82 |
| abstract_inverted_index.realism, | 74 |
| abstract_inverted_index.semantic | 119 |
| abstract_inverted_index.struggle | 41 |
| abstract_inverted_index.supports | 130 |
| abstract_inverted_index.training | 12, 65, 182 |
| abstract_inverted_index.Extensive | 151 |
| abstract_inverted_index.TextSSR-F | 159 |
| abstract_inverted_index.accuracy, | 73 |
| abstract_inverted_index.appealing | 33 |
| abstract_inverted_index.available | 186 |
| abstract_inverted_index.character | 91 |
| abstract_inverted_index.coherence | 120 |
| abstract_inverted_index.diffusion | 112 |
| abstract_inverted_index.introduce | 55 |
| abstract_inverted_index.maintains | 94 |
| abstract_inverted_index.producing | 31 |
| abstract_inverted_index.realistic | 10, 46 |
| abstract_inverted_index.synthetic | 11, 165 |
| abstract_inverted_index.Meanwhile, | 29 |
| abstract_inverted_index.TextSSR-F, | 142 |
| abstract_inverted_index.Therefore, | 128 |
| abstract_inverted_index.appearance | 100 |
| abstract_inverted_index.challenges | 6 |
| abstract_inverted_index.collecting | 18 |
| abstract_inverted_index.contextual | 103 |
| abstract_inverted_index.difficulty | 16 |
| abstract_inverted_index.generation | 39, 85, 101, 132 |
| abstract_inverted_index.instances. | 150 |
| abstract_inverted_index.outperform | 160 |
| abstract_inverted_index.placement. | 92 |
| abstract_inverted_index.real-world | 21, 181 |
| abstract_inverted_index.sufficient | 19 |
| abstract_inverted_index.synthesize | 43 |
| abstract_inverted_index.Recognition | 64 |
| abstract_inverted_index.background. | 109 |
| abstract_inverted_index.benchmarks, | 172 |
| abstract_inverted_index.experiments | 152 |
| abstract_inverted_index.large-scale | 131 |
| abstract_inverted_index.recognition | 2 |
| abstract_inverted_index.surrounding | 106 |
| abstract_inverted_index.Synthesizing | 61 |
| abstract_inverted_index.architecture | 113 |
| abstract_inverted_index.enhancement, | 88 |
| abstract_inverted_index.high-quality | 20 |
| abstract_inverted_index.holistically | 32 |
| abstract_inverted_index.improvements | 175 |
| abstract_inverted_index.scalability. | 76 |
| abstract_inverted_index.synthesizing | 71 |
| abstract_inverted_index.combinatorial | 134 |
| abstract_inverted_index.effectiveness | 25 |
| abstract_inverted_index.permutations. | 136 |
| abstract_inverted_index.preservation, | 121 |
| abstract_inverted_index.instance-level | 47 |
| abstract_inverted_index.position-glyph | 87 |
| abstract_inverted_index.region-centric | 83 |
| abstract_inverted_index.character-aware | 111 |
| abstract_inverted_index.character-level | 116 |
| abstract_inverted_index.diffusion-based | 36 |
| abstract_inverted_index.characteristics: | 72 |
| abstract_inverted_index.quality-screened | 148 |
| abstract_inverted_index.https://github.com/YesianRohn/TextSSR. | 188 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |