MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2409.09352
In accented voice conversion or accent conversion, we seek to convert the accent in speech from one another while preserving speaker identity and semantic content. In this study, we formulate a novel method for creating multi-accented speech samples, thus pairs of accented speech samples by the same speaker, through text transliteration for training accent conversion systems. We begin by generating transliterated text with Large Language Models (LLMs), which is then fed into multilingual TTS models to synthesize accented English speech. As a reference system, we built a sequence-to-sequence model on the synthetic parallel corpus for accent conversion. We validated the proposed method for both native and non-native English speakers. Subjective and objective evaluations further validate our dataset's effectiveness in accent conversion studies.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2409.09352
- https://arxiv.org/pdf/2409.09352
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4403667143
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4403667143Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2409.09352Digital Object Identifier
- Title
-
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent ConversionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-09-14Full publication date if available
- Authors
-
Sho Inoue, Shuai Wang, Wanxing Wang, Pengcheng Zhu, Mengxiao Bi, Haizhou LiList of authors in order
- Landing page
-
https://arxiv.org/abs/2409.09352Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2409.09352Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2409.09352Direct OA link when available
- Concepts
-
Stress (linguistics), Transliteration, Natural language processing, Computer science, Speech synthesis, Speech recognition, Artificial intelligence, Linguistics, PhilosophyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4403667143 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2409.09352 |
| ids.doi | https://doi.org/10.48550/arxiv.2409.09352 |
| ids.openalex | https://openalex.org/W4403667143 |
| fwci | |
| type | preprint |
| title | MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9957000017166138 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| topics[1].id | https://openalex.org/T10201 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9945999979972839 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Speech Recognition and Synthesis |
| topics[2].id | https://openalex.org/T12031 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9686999917030334 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Speech and dialogue systems |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2776756274 |
| concepts[0].level | 2 |
| concepts[0].score | 0.9061947464942932 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q181767 |
| concepts[0].display_name | Stress (linguistics) |
| concepts[1].id | https://openalex.org/C520968082 |
| concepts[1].level | 2 |
| concepts[1].score | 0.8761794567108154 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q134550 |
| concepts[1].display_name | Transliteration |
| concepts[2].id | https://openalex.org/C204321447 |
| concepts[2].level | 1 |
| concepts[2].score | 0.5682026743888855 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[2].display_name | Natural language processing |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.5582820773124695 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C14999030 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5529883503913879 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q16346 |
| concepts[4].display_name | Speech synthesis |
| concepts[5].id | https://openalex.org/C28490314 |
| concepts[5].level | 1 |
| concepts[5].score | 0.4916868805885315 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[5].display_name | Speech recognition |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.4638628363609314 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C41895202 |
| concepts[7].level | 1 |
| concepts[7].score | 0.4450398087501526 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[7].display_name | Linguistics |
| concepts[8].id | https://openalex.org/C138885662 |
| concepts[8].level | 0 |
| concepts[8].score | 0.045176684856414795 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[8].display_name | Philosophy |
| keywords[0].id | https://openalex.org/keywords/stress |
| keywords[0].score | 0.9061947464942932 |
| keywords[0].display_name | Stress (linguistics) |
| keywords[1].id | https://openalex.org/keywords/transliteration |
| keywords[1].score | 0.8761794567108154 |
| keywords[1].display_name | Transliteration |
| keywords[2].id | https://openalex.org/keywords/natural-language-processing |
| keywords[2].score | 0.5682026743888855 |
| keywords[2].display_name | Natural language processing |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.5582820773124695 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/speech-synthesis |
| keywords[4].score | 0.5529883503913879 |
| keywords[4].display_name | Speech synthesis |
| keywords[5].id | https://openalex.org/keywords/speech-recognition |
| keywords[5].score | 0.4916868805885315 |
| keywords[5].display_name | Speech recognition |
| keywords[6].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[6].score | 0.4638628363609314 |
| keywords[6].display_name | Artificial intelligence |
| keywords[7].id | https://openalex.org/keywords/linguistics |
| keywords[7].score | 0.4450398087501526 |
| keywords[7].display_name | Linguistics |
| keywords[8].id | https://openalex.org/keywords/philosophy |
| keywords[8].score | 0.045176684856414795 |
| keywords[8].display_name | Philosophy |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2409.09352 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://arxiv.org/pdf/2409.09352 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2409.09352 |
| locations[1].id | doi:10.48550/arxiv.2409.09352 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2409.09352 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5108413182 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Sho Inoue |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Inoue, Sho |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100328312 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-7897-2024 |
| authorships[1].author.display_name | Shuai Wang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wang, Shuai |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5007446660 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Wanxing Wang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Wang, Wanxing |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5050166453 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-9867-7041 |
| authorships[3].author.display_name | Pengcheng Zhu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zhu, Pengcheng |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5036369578 |
| authorships[4].author.orcid | https://orcid.org/0009-0007-6680-481X |
| authorships[4].author.display_name | Mengxiao Bi |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Bi, Mengxiao |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5032690182 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-9158-9401 |
| authorships[5].author.display_name | Haizhou Li |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Li, Haizhou |
| authorships[5].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2409.09352 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9957000017166138 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W3153459181, https://openalex.org/W2147866274, https://openalex.org/W2350015575, https://openalex.org/W2371976984, https://openalex.org/W2352160949, https://openalex.org/W2378436233, https://openalex.org/W2251148428, https://openalex.org/W2907809867, https://openalex.org/W1990041434, https://openalex.org/W2069398544 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2409.09352 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2409.09352 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2409.09352 |
| primary_location.id | pmh:oai:arXiv.org:2409.09352 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://arxiv.org/pdf/2409.09352 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2409.09352 |
| publication_date | 2024-09-14 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 30, 81, 86 |
| abstract_inverted_index.As | 80 |
| abstract_inverted_index.In | 0, 25 |
| abstract_inverted_index.We | 56, 97 |
| abstract_inverted_index.by | 44, 58 |
| abstract_inverted_index.in | 13, 118 |
| abstract_inverted_index.is | 68 |
| abstract_inverted_index.of | 40 |
| abstract_inverted_index.on | 89 |
| abstract_inverted_index.or | 4 |
| abstract_inverted_index.to | 9, 75 |
| abstract_inverted_index.we | 7, 28, 84 |
| abstract_inverted_index.TTS | 73 |
| abstract_inverted_index.and | 22, 105, 110 |
| abstract_inverted_index.fed | 70 |
| abstract_inverted_index.for | 33, 51, 94, 102 |
| abstract_inverted_index.one | 16 |
| abstract_inverted_index.our | 115 |
| abstract_inverted_index.the | 11, 45, 90, 99 |
| abstract_inverted_index.both | 103 |
| abstract_inverted_index.from | 15 |
| abstract_inverted_index.into | 71 |
| abstract_inverted_index.same | 46 |
| abstract_inverted_index.seek | 8 |
| abstract_inverted_index.text | 49, 61 |
| abstract_inverted_index.then | 69 |
| abstract_inverted_index.this | 26 |
| abstract_inverted_index.thus | 38 |
| abstract_inverted_index.with | 62 |
| abstract_inverted_index.Large | 63 |
| abstract_inverted_index.begin | 57 |
| abstract_inverted_index.built | 85 |
| abstract_inverted_index.model | 88 |
| abstract_inverted_index.novel | 31 |
| abstract_inverted_index.pairs | 39 |
| abstract_inverted_index.voice | 2 |
| abstract_inverted_index.which | 67 |
| abstract_inverted_index.while | 18 |
| abstract_inverted_index.Models | 65 |
| abstract_inverted_index.accent | 5, 12, 53, 95, 119 |
| abstract_inverted_index.corpus | 93 |
| abstract_inverted_index.method | 32, 101 |
| abstract_inverted_index.models | 74 |
| abstract_inverted_index.native | 104 |
| abstract_inverted_index.speech | 14, 36, 42 |
| abstract_inverted_index.study, | 27 |
| abstract_inverted_index.(LLMs), | 66 |
| abstract_inverted_index.English | 78, 107 |
| abstract_inverted_index.another | 17 |
| abstract_inverted_index.convert | 10 |
| abstract_inverted_index.further | 113 |
| abstract_inverted_index.samples | 43 |
| abstract_inverted_index.speaker | 20 |
| abstract_inverted_index.speech. | 79 |
| abstract_inverted_index.system, | 83 |
| abstract_inverted_index.through | 48 |
| abstract_inverted_index.Language | 64 |
| abstract_inverted_index.accented | 1, 41, 77 |
| abstract_inverted_index.content. | 24 |
| abstract_inverted_index.creating | 34 |
| abstract_inverted_index.identity | 21 |
| abstract_inverted_index.parallel | 92 |
| abstract_inverted_index.proposed | 100 |
| abstract_inverted_index.samples, | 37 |
| abstract_inverted_index.semantic | 23 |
| abstract_inverted_index.speaker, | 47 |
| abstract_inverted_index.studies. | 121 |
| abstract_inverted_index.systems. | 55 |
| abstract_inverted_index.training | 52 |
| abstract_inverted_index.validate | 114 |
| abstract_inverted_index.dataset's | 116 |
| abstract_inverted_index.formulate | 29 |
| abstract_inverted_index.objective | 111 |
| abstract_inverted_index.reference | 82 |
| abstract_inverted_index.speakers. | 108 |
| abstract_inverted_index.synthetic | 91 |
| abstract_inverted_index.validated | 98 |
| abstract_inverted_index.Subjective | 109 |
| abstract_inverted_index.conversion | 3, 54, 120 |
| abstract_inverted_index.generating | 59 |
| abstract_inverted_index.non-native | 106 |
| abstract_inverted_index.preserving | 19 |
| abstract_inverted_index.synthesize | 76 |
| abstract_inverted_index.conversion, | 6 |
| abstract_inverted_index.conversion. | 96 |
| abstract_inverted_index.evaluations | 112 |
| abstract_inverted_index.multilingual | 72 |
| abstract_inverted_index.effectiveness | 117 |
| abstract_inverted_index.multi-accented | 35 |
| abstract_inverted_index.transliterated | 60 |
| abstract_inverted_index.transliteration | 50 |
| abstract_inverted_index.sequence-to-sequence | 87 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |