FastSpeech: Fast, Robust and Controllable Text to Speech Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.57702/jd8hw0cw
Neural network based end-to-end text to speech (TTS) has significantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron 2) usually first generate mel-spectrogram from text, and then synthesize speech from the mel-spectrogram using vocoder such as WaveNet. Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i.e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control). In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS. Specifically, we extract attention alignments from an encoder-decoder based teacher model for phoneme duration prediction, which is used by a length regulator to expand the source phoneme sequence to match the length of the target mel-spectrogram sequence for parallel mel-spectrogram generation. Experiments on the LJSpeech dataset show that our parallel model matches autoregressive models in terms of speech quality, nearly eliminates the problem of word skipping and repeating in particularly hard cases, and can adjust voice speed smoothly. Most importantly, compared with autoregressive Transformer TTS, our model speeds up mel-spectrogram generation by 270x and the end-to-end speech synthesis by 38x. Therefore, we call our model FastSpeech.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.57702/jd8hw0cw
- OA Status
- green
- Cited By
- 258
- Related Works
- 20
- OpenAlex ID
- https://openalex.org/W2970730223
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2970730223Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.57702/jd8hw0cwDigital Object Identifier
- Title
-
FastSpeech: Fast, Robust and Controllable Text to SpeechWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-01-01Full publication date if available
- Authors
-
Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie‐Yan LiuList of authors in order
- Landing page
-
https://doi.org/10.57702/jd8hw0cwPublisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.57702/jd8hw0cwDirect OA link when available
- Concepts
-
Spectrogram, Speech recognition, Computer science, Speech synthesis, Autoregressive model, Encoder, Artificial neural network, Parametric statistics, Transformer, Hidden Markov model, Artificial intelligence, Mathematics, Engineering, Econometrics, Voltage, Operating system, Electrical engineering, StatisticsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
258Total citation count in OpenAlex
- Citations by year (recent)
-
2022: 18, 2021: 148, 2020: 84, 2019: 7, 2018: 1Per-year citation counts (last 5 years)
- Related works (count)
-
20Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2970730223 |
|---|---|
| doi | https://doi.org/10.57702/jd8hw0cw |
| ids.doi | https://doi.org/10.57702/jd8hw0cw |
| ids.mag | 2970730223 |
| ids.openalex | https://openalex.org/W2970730223 |
| fwci | |
| type | article |
| title | FastSpeech: Fast, Robust and Controllable Text to Speech |
| biblio.issue | |
| biblio.volume | 32 |
| biblio.last_page | 3174 |
| biblio.first_page | 3165 |
| topics[0].id | https://openalex.org/T10201 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9984999895095825 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech Recognition and Synthesis |
| topics[1].id | https://openalex.org/T10860 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9901999831199646 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Speech and Audio Processing |
| topics[2].id | https://openalex.org/T12031 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9686999917030334 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Speech and dialogue systems |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C45273575 |
| concepts[0].level | 2 |
| concepts[0].score | 0.9385027885437012 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q578970 |
| concepts[0].display_name | Spectrogram |
| concepts[1].id | https://openalex.org/C28490314 |
| concepts[1].level | 1 |
| concepts[1].score | 0.7574721574783325 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[1].display_name | Speech recognition |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.7501735091209412 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C14999030 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5690464377403259 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q16346 |
| concepts[3].display_name | Speech synthesis |
| concepts[4].id | https://openalex.org/C159877910 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5394078493118286 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q2202883 |
| concepts[4].display_name | Autoregressive model |
| concepts[5].id | https://openalex.org/C118505674 |
| concepts[5].level | 2 |
| concepts[5].score | 0.531168520450592 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q42586063 |
| concepts[5].display_name | Encoder |
| concepts[6].id | https://openalex.org/C50644808 |
| concepts[6].level | 2 |
| concepts[6].score | 0.48929616808891296 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q192776 |
| concepts[6].display_name | Artificial neural network |
| concepts[7].id | https://openalex.org/C117251300 |
| concepts[7].level | 2 |
| concepts[7].score | 0.4890082776546478 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1849855 |
| concepts[7].display_name | Parametric statistics |
| concepts[8].id | https://openalex.org/C66322947 |
| concepts[8].level | 3 |
| concepts[8].score | 0.47582557797431946 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[8].display_name | Transformer |
| concepts[9].id | https://openalex.org/C23224414 |
| concepts[9].level | 2 |
| concepts[9].score | 0.4182334542274475 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q176769 |
| concepts[9].display_name | Hidden Markov model |
| concepts[10].id | https://openalex.org/C154945302 |
| concepts[10].level | 1 |
| concepts[10].score | 0.3300478458404541 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[10].display_name | Artificial intelligence |
| concepts[11].id | https://openalex.org/C33923547 |
| concepts[11].level | 0 |
| concepts[11].score | 0.11929896473884583 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[11].display_name | Mathematics |
| concepts[12].id | https://openalex.org/C127413603 |
| concepts[12].level | 0 |
| concepts[12].score | 0.07397344708442688 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[12].display_name | Engineering |
| concepts[13].id | https://openalex.org/C149782125 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q160039 |
| concepts[13].display_name | Econometrics |
| concepts[14].id | https://openalex.org/C165801399 |
| concepts[14].level | 2 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q25428 |
| concepts[14].display_name | Voltage |
| concepts[15].id | https://openalex.org/C111919701 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[15].display_name | Operating system |
| concepts[16].id | https://openalex.org/C119599485 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q43035 |
| concepts[16].display_name | Electrical engineering |
| concepts[17].id | https://openalex.org/C105795698 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[17].display_name | Statistics |
| keywords[0].id | https://openalex.org/keywords/spectrogram |
| keywords[0].score | 0.9385027885437012 |
| keywords[0].display_name | Spectrogram |
| keywords[1].id | https://openalex.org/keywords/speech-recognition |
| keywords[1].score | 0.7574721574783325 |
| keywords[1].display_name | Speech recognition |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.7501735091209412 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/speech-synthesis |
| keywords[3].score | 0.5690464377403259 |
| keywords[3].display_name | Speech synthesis |
| keywords[4].id | https://openalex.org/keywords/autoregressive-model |
| keywords[4].score | 0.5394078493118286 |
| keywords[4].display_name | Autoregressive model |
| keywords[5].id | https://openalex.org/keywords/encoder |
| keywords[5].score | 0.531168520450592 |
| keywords[5].display_name | Encoder |
| keywords[6].id | https://openalex.org/keywords/artificial-neural-network |
| keywords[6].score | 0.48929616808891296 |
| keywords[6].display_name | Artificial neural network |
| keywords[7].id | https://openalex.org/keywords/parametric-statistics |
| keywords[7].score | 0.4890082776546478 |
| keywords[7].display_name | Parametric statistics |
| keywords[8].id | https://openalex.org/keywords/transformer |
| keywords[8].score | 0.47582557797431946 |
| keywords[8].display_name | Transformer |
| keywords[9].id | https://openalex.org/keywords/hidden-markov-model |
| keywords[9].score | 0.4182334542274475 |
| keywords[9].display_name | Hidden Markov model |
| keywords[10].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[10].score | 0.3300478458404541 |
| keywords[10].display_name | Artificial intelligence |
| keywords[11].id | https://openalex.org/keywords/mathematics |
| keywords[11].score | 0.11929896473884583 |
| keywords[11].display_name | Mathematics |
| keywords[12].id | https://openalex.org/keywords/engineering |
| keywords[12].score | 0.07397344708442688 |
| keywords[12].display_name | Engineering |
| language | en |
| locations[0].id | doi:10.57702/jd8hw0cw |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S7407053387 |
| locations[0].source.type | repository |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | TIB Data Manager |
| locations[0].source.host_organization | |
| locations[0].source.host_organization_name | |
| locations[0].license | |
| locations[0].pdf_url | |
| locations[0].version | |
| locations[0].raw_type | dataset |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.57702/jd8hw0cw |
| locations[1].id | mag:2970730223 |
| locations[1].is_oa | False |
| locations[1].source.id | https://openalex.org/S4306420609 |
| locations[1].source.issn | |
| locations[1].source.type | conference |
| locations[1].source.is_oa | False |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | Neural Information Processing Systems |
| locations[1].source.host_organization | |
| locations[1].source.host_organization_name | |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | Neural Information Processing Systems |
| locations[1].landing_page_url | https://papers.nips.cc/paper/2019/file/f63f65b503e22cb970527f23c9ad7db1-Paper.pdf |
| indexed_in | datacite |
| authorships[0].author.id | https://openalex.org/A5088179161 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-9889-5460 |
| authorships[0].author.display_name | Yi Ren |
| authorships[0].countries | CN |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I76130692 |
| authorships[0].affiliations[0].raw_affiliation_string | Zhejiang University, Hangzhou, China |
| authorships[0].institutions[0].id | https://openalex.org/I76130692 |
| authorships[0].institutions[0].ror | https://ror.org/00a2xv884 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I76130692 |
| authorships[0].institutions[0].country_code | CN |
| authorships[0].institutions[0].display_name | Zhejiang University |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Yi Ren |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Zhejiang University, Hangzhou, China |
| authorships[1].author.id | https://openalex.org/A5074108427 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-0816-219X |
| authorships[1].author.display_name | Yangjun Ruan |
| authorships[1].countries | CN |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I76130692 |
| authorships[1].affiliations[0].raw_affiliation_string | Zhejiang University, Hangzhou, China |
| authorships[1].institutions[0].id | https://openalex.org/I76130692 |
| authorships[1].institutions[0].ror | https://ror.org/00a2xv884 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I76130692 |
| authorships[1].institutions[0].country_code | CN |
| authorships[1].institutions[0].display_name | Zhejiang University |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yangjun Ruan |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Zhejiang University, Hangzhou, China |
| authorships[2].author.id | https://openalex.org/A5018286848 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-6123-4378 |
| authorships[2].author.display_name | Xu Tan |
| authorships[2].countries | US |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I1290206253 |
| authorships[2].affiliations[0].raw_affiliation_string | Microsoft (United States), Redmond, United States |
| authorships[2].institutions[0].id | https://openalex.org/I1290206253 |
| authorships[2].institutions[0].ror | https://ror.org/00d0nc645 |
| authorships[2].institutions[0].type | company |
| authorships[2].institutions[0].lineage | https://openalex.org/I1290206253 |
| authorships[2].institutions[0].country_code | US |
| authorships[2].institutions[0].display_name | Microsoft (United States) |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Xu Tan |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | Microsoft (United States), Redmond, United States |
| authorships[3].author.id | https://openalex.org/A5020025718 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-9095-0776 |
| authorships[3].author.display_name | Tao Qin |
| authorships[3].countries | US |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I1290206253 |
| authorships[3].affiliations[0].raw_affiliation_string | Microsoft (United States), Redmond, United States |
| authorships[3].institutions[0].id | https://openalex.org/I1290206253 |
| authorships[3].institutions[0].ror | https://ror.org/00d0nc645 |
| authorships[3].institutions[0].type | company |
| authorships[3].institutions[0].lineage | https://openalex.org/I1290206253 |
| authorships[3].institutions[0].country_code | US |
| authorships[3].institutions[0].display_name | Microsoft (United States) |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Tao Qin |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | Microsoft (United States), Redmond, United States |
| authorships[4].author.id | https://openalex.org/A5100329353 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-9624-5381 |
| authorships[4].author.display_name | Sheng Zhao |
| authorships[4].countries | US |
| authorships[4].affiliations[0].institution_ids | https://openalex.org/I1290206253 |
| authorships[4].affiliations[0].raw_affiliation_string | Microsoft (United States), Redmond, United States |
| authorships[4].institutions[0].id | https://openalex.org/I1290206253 |
| authorships[4].institutions[0].ror | https://ror.org/00d0nc645 |
| authorships[4].institutions[0].type | company |
| authorships[4].institutions[0].lineage | https://openalex.org/I1290206253 |
| authorships[4].institutions[0].country_code | US |
| authorships[4].institutions[0].display_name | Microsoft (United States) |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Sheng Zhao |
| authorships[4].is_corresponding | False |
| authorships[4].raw_affiliation_strings | Microsoft (United States), Redmond, United States |
| authorships[5].author.id | https://openalex.org/A5079260216 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-6121-0384 |
| authorships[5].author.display_name | Zhou Zhao |
| authorships[5].countries | CN |
| authorships[5].affiliations[0].institution_ids | https://openalex.org/I76130692 |
| authorships[5].affiliations[0].raw_affiliation_string | Zhejiang University, Hangzhou, China |
| authorships[5].institutions[0].id | https://openalex.org/I76130692 |
| authorships[5].institutions[0].ror | https://ror.org/00a2xv884 |
| authorships[5].institutions[0].type | education |
| authorships[5].institutions[0].lineage | https://openalex.org/I76130692 |
| authorships[5].institutions[0].country_code | CN |
| authorships[5].institutions[0].display_name | Zhejiang University |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Zhou Zhao |
| authorships[5].is_corresponding | False |
| authorships[5].raw_affiliation_strings | Zhejiang University, Hangzhou, China |
| authorships[6].author.id | https://openalex.org/A5115592065 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Tie‐Yan Liu |
| authorships[6].countries | US |
| authorships[6].affiliations[0].institution_ids | https://openalex.org/I1290206253 |
| authorships[6].affiliations[0].raw_affiliation_string | Microsoft (United States), Redmond, United States |
| authorships[6].institutions[0].id | https://openalex.org/I1290206253 |
| authorships[6].institutions[0].ror | https://ror.org/00d0nc645 |
| authorships[6].institutions[0].type | company |
| authorships[6].institutions[0].lineage | https://openalex.org/I1290206253 |
| authorships[6].institutions[0].country_code | US |
| authorships[6].institutions[0].display_name | Microsoft (United States) |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Tie-Yan Liu |
| authorships[6].is_corresponding | False |
| authorships[6].raw_affiliation_strings | Microsoft (United States), Redmond, United States |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.57702/jd8hw0cw |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | FastSpeech: Fast, Robust and Controllable Text to Speech |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10201 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9984999895095825 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech Recognition and Synthesis |
| related_works | https://openalex.org/W3130016944, https://openalex.org/W3015338123, https://openalex.org/W2970006822, https://openalex.org/W2964308564, https://openalex.org/W2964243274, https://openalex.org/W2964121744, https://openalex.org/W2963609956, https://openalex.org/W2963403868, https://openalex.org/W2963300588, https://openalex.org/W2949382160, https://openalex.org/W2903739847, https://openalex.org/W2964307104, https://openalex.org/W2963691546, https://openalex.org/W2471520273, https://openalex.org/W2766812927, https://openalex.org/W2963091184, https://openalex.org/W1494198834, https://openalex.org/W2901997113, https://openalex.org/W2747874407, https://openalex.org/W2120847449 |
| cited_by_count | 258 |
| counts_by_year[0].year | 2022 |
| counts_by_year[0].cited_by_count | 18 |
| counts_by_year[1].year | 2021 |
| counts_by_year[1].cited_by_count | 148 |
| counts_by_year[2].year | 2020 |
| counts_by_year[2].cited_by_count | 84 |
| counts_by_year[3].year | 2019 |
| counts_by_year[3].cited_by_count | 7 |
| counts_by_year[4].year | 2018 |
| counts_by_year[4].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | doi:10.57702/jd8hw0cw |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S7407053387 |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | TIB Data Manager |
| best_oa_location.source.host_organization | |
| best_oa_location.source.host_organization_name | |
| best_oa_location.license | |
| best_oa_location.pdf_url | |
| best_oa_location.version | |
| best_oa_location.raw_type | dataset |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.57702/jd8hw0cw |
| primary_location.id | doi:10.57702/jd8hw0cw |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S7407053387 |
| primary_location.source.type | repository |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | TIB Data Manager |
| primary_location.source.host_organization | |
| primary_location.source.host_organization_name | |
| primary_location.license | |
| primary_location.pdf_url | |
| primary_location.version | |
| primary_location.raw_type | dataset |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.57702/jd8hw0cw |
| publication_date | 2024-01-01 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 86, 119 |
| abstract_inverted_index.2) | 20 |
| abstract_inverted_index.In | 81 |
| abstract_inverted_index.an | 106 |
| abstract_inverted_index.as | 37 |
| abstract_inverted_index.by | 118, 191, 198 |
| abstract_inverted_index.in | 96, 154, 168 |
| abstract_inverted_index.is | 61, 116 |
| abstract_inverted_index.of | 13, 74, 132, 156, 163 |
| abstract_inverted_index.on | 91, 142 |
| abstract_inverted_index.or | 70, 78 |
| abstract_inverted_index.to | 5, 93, 122, 128 |
| abstract_inverted_index.up | 188 |
| abstract_inverted_index.we | 84, 101, 201 |
| abstract_inverted_index.and | 27, 43, 57, 72, 166, 172, 193 |
| abstract_inverted_index.are | 68 |
| abstract_inverted_index.can | 173 |
| abstract_inverted_index.for | 98, 111, 137 |
| abstract_inverted_index.has | 8 |
| abstract_inverted_index.not | 63 |
| abstract_inverted_index.our | 148, 185, 203 |
| abstract_inverted_index.the | 11, 32, 58, 124, 130, 133, 143, 161, 194 |
| abstract_inverted_index.270x | 192 |
| abstract_inverted_index.38x. | 199 |
| abstract_inverted_index.Most | 178 |
| abstract_inverted_index.TTS, | 184 |
| abstract_inverted_index.TTS. | 99 |
| abstract_inverted_index.call | 202 |
| abstract_inverted_index.from | 25, 31, 53, 105 |
| abstract_inverted_index.hard | 170 |
| abstract_inverted_index.lack | 73 |
| abstract_inverted_index.show | 146 |
| abstract_inverted_index.slow | 54 |
| abstract_inverted_index.some | 66 |
| abstract_inverted_index.such | 36 |
| abstract_inverted_index.text | 4 |
| abstract_inverted_index.that | 147 |
| abstract_inverted_index.then | 28 |
| abstract_inverted_index.this | 82 |
| abstract_inverted_index.used | 117 |
| abstract_inverted_index.with | 40, 181 |
| abstract_inverted_index.word | 164 |
| abstract_inverted_index.(TTS) | 7 |
| abstract_inverted_index.based | 2, 49, 90, 108 |
| abstract_inverted_index.first | 22 |
| abstract_inverted_index.match | 129 |
| abstract_inverted_index.model | 110, 150, 186, 204 |
| abstract_inverted_index.novel | 87 |
| abstract_inverted_index.speed | 77, 176 |
| abstract_inverted_index.terms | 155 |
| abstract_inverted_index.text, | 26 |
| abstract_inverted_index.using | 34 |
| abstract_inverted_index.voice | 175 |
| abstract_inverted_index.which | 115 |
| abstract_inverted_index.words | 67 |
| abstract_inverted_index.work, | 83 |
| abstract_inverted_index.(e.g., | 18 |
| abstract_inverted_index.(i.e., | 65 |
| abstract_inverted_index.(voice | 76 |
| abstract_inverted_index.Neural | 0 |
| abstract_inverted_index.adjust | 174 |
| abstract_inverted_index.cases, | 171 |
| abstract_inverted_index.expand | 123 |
| abstract_inverted_index.length | 120, 131 |
| abstract_inverted_index.models | 51, 153 |
| abstract_inverted_index.nearly | 159 |
| abstract_inverted_index.neural | 47 |
| abstract_inverted_index.robust | 64 |
| abstract_inverted_index.source | 125 |
| abstract_inverted_index.speech | 6, 30, 60, 157, 196 |
| abstract_inverted_index.speed, | 56 |
| abstract_inverted_index.speeds | 187 |
| abstract_inverted_index.suffer | 52 |
| abstract_inverted_index.target | 134 |
| abstract_inverted_index.dataset | 145 |
| abstract_inverted_index.extract | 102 |
| abstract_inverted_index.matches | 151 |
| abstract_inverted_index.methods | 17 |
| abstract_inverted_index.network | 1, 48, 89 |
| abstract_inverted_index.phoneme | 112, 126 |
| abstract_inverted_index.problem | 162 |
| abstract_inverted_index.propose | 85 |
| abstract_inverted_index.prosody | 79 |
| abstract_inverted_index.quality | 12 |
| abstract_inverted_index.skipped | 69 |
| abstract_inverted_index.speech. | 15 |
| abstract_inverted_index.teacher | 109 |
| abstract_inverted_index.usually | 21, 62 |
| abstract_inverted_index.vocoder | 35 |
| abstract_inverted_index.Compared | 39 |
| abstract_inverted_index.LJSpeech | 144 |
| abstract_inverted_index.Tacotron | 19 |
| abstract_inverted_index.WaveNet. | 38 |
| abstract_inverted_index.compared | 180 |
| abstract_inverted_index.duration | 113 |
| abstract_inverted_index.generate | 23, 94 |
| abstract_inverted_index.improved | 10 |
| abstract_inverted_index.parallel | 97, 138, 149 |
| abstract_inverted_index.quality, | 158 |
| abstract_inverted_index.sequence | 127, 136 |
| abstract_inverted_index.skipping | 165 |
| abstract_inverted_index.Prominent | 16 |
| abstract_inverted_index.attention | 103 |
| abstract_inverted_index.control). | 80 |
| abstract_inverted_index.inference | 55 |
| abstract_inverted_index.regulator | 121 |
| abstract_inverted_index.repeated) | 71 |
| abstract_inverted_index.repeating | 167 |
| abstract_inverted_index.smoothly. | 177 |
| abstract_inverted_index.synthesis | 197 |
| abstract_inverted_index.Therefore, | 200 |
| abstract_inverted_index.alignments | 104 |
| abstract_inverted_index.eliminates | 160 |
| abstract_inverted_index.end-to-end | 3, 50, 195 |
| abstract_inverted_index.generation | 190 |
| abstract_inverted_index.parametric | 45 |
| abstract_inverted_index.synthesize | 29 |
| abstract_inverted_index.Experiments | 141 |
| abstract_inverted_index.FastSpeech. | 205 |
| abstract_inverted_index.Transformer | 92, 183 |
| abstract_inverted_index.approaches, | 46 |
| abstract_inverted_index.generation. | 140 |
| abstract_inverted_index.prediction, | 114 |
| abstract_inverted_index.statistical | 44 |
| abstract_inverted_index.synthesized | 14, 59 |
| abstract_inverted_index.traditional | 41 |
| abstract_inverted_index.feed-forward | 88 |
| abstract_inverted_index.importantly, | 179 |
| abstract_inverted_index.particularly | 169 |
| abstract_inverted_index.Specifically, | 100 |
| abstract_inverted_index.concatenative | 42 |
| abstract_inverted_index.significantly | 9 |
| abstract_inverted_index.autoregressive | 152, 182 |
| abstract_inverted_index.controllability | 75 |
| abstract_inverted_index.encoder-decoder | 107 |
| abstract_inverted_index.mel-spectrogram | 24, 33, 95, 135, 139, 189 |
| cited_by_percentile_year | |
| countries_distinct_count | 2 |
| institutions_distinct_count | 7 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.6000000238418579 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |