Autoregressive Diffusion Transformer for Text-to-Speech Synthesis Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2406.05551
Audio language models have recently emerged as a promising approach for various audio generation tasks, relying on audio tokenizers to encode waveforms into sequences of discrete symbols. Audio tokenization often poses a necessary compromise between code bitrate and reconstruction accuracy. When dealing with low-bitrate audio codes, language models are constrained to process only a subset of the information embedded in the audio, which in turn restricts their generative capabilities. To circumvent these issues, we propose encoding audio as vector sequences in continuous space $\mathbb R^d$ and autoregressively generating these sequences using a decoder-only diffusion transformer (ARDiT). Our findings indicate that ARDiT excels in zero-shot text-to-speech and exhibits performance that compares to or even surpasses that of state-of-the-art models. High-bitrate continuous speech representation enables almost flawless reconstruction, allowing our model to achieve nearly perfect speech editing. Our experiments reveal that employing Integral Kullback-Leibler (IKL) divergence for distillation at each autoregressive step significantly boosts the perceived quality of the samples. Simultaneously, it condenses the iterative sampling process of the diffusion model into a single step. Furthermore, ARDiT can be trained to predict several continuous vectors in one step, significantly reducing latency during sampling. Impressively, one of our models can generate $170$ ms of $24$ kHz speech per evaluation step with minimal degradation in performance. Audio samples are available at http://ardit-tts.github.io/ .
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2406.05551
- https://arxiv.org/pdf/2406.05551
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4399554908
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4399554908Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2406.05551Digital Object Identifier
- Title
-
Autoregressive Diffusion Transformer for Text-to-Speech SynthesisWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-06-08Full publication date if available
- Authors
-
Zhijun Liu, Shuai Wang, Sho Inoue, Qibing Bai, Haizhou LiList of authors in order
- Landing page
-
https://arxiv.org/abs/2406.05551Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2406.05551Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2406.05551Direct OA link when available
- Concepts
-
Autoregressive model, Transformer, Speech recognition, Computer science, Diffusion, Natural language processing, Electrical engineering, Econometrics, Mathematics, Engineering, Physics, Voltage, ThermodynamicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4399554908 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2406.05551 |
| ids.doi | https://doi.org/10.48550/arxiv.2406.05551 |
| ids.openalex | https://openalex.org/W4399554908 |
| fwci | |
| type | preprint |
| title | Autoregressive Diffusion Transformer for Text-to-Speech Synthesis |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10201 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9939000010490417 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech Recognition and Synthesis |
| topics[1].id | https://openalex.org/T10860 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9251000285148621 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Speech and Audio Processing |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C159877910 |
| concepts[0].level | 2 |
| concepts[0].score | 0.717806875705719 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q2202883 |
| concepts[0].display_name | Autoregressive model |
| concepts[1].id | https://openalex.org/C66322947 |
| concepts[1].level | 3 |
| concepts[1].score | 0.6032219529151917 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[1].display_name | Transformer |
| concepts[2].id | https://openalex.org/C28490314 |
| concepts[2].level | 1 |
| concepts[2].score | 0.48506084084510803 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[2].display_name | Speech recognition |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.48174241185188293 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C69357855 |
| concepts[4].level | 2 |
| concepts[4].score | 0.48023825883865356 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q163214 |
| concepts[4].display_name | Diffusion |
| concepts[5].id | https://openalex.org/C204321447 |
| concepts[5].level | 1 |
| concepts[5].score | 0.32293224334716797 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[5].display_name | Natural language processing |
| concepts[6].id | https://openalex.org/C119599485 |
| concepts[6].level | 1 |
| concepts[6].score | 0.21377459168434143 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q43035 |
| concepts[6].display_name | Electrical engineering |
| concepts[7].id | https://openalex.org/C149782125 |
| concepts[7].level | 1 |
| concepts[7].score | 0.20260953903198242 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q160039 |
| concepts[7].display_name | Econometrics |
| concepts[8].id | https://openalex.org/C33923547 |
| concepts[8].level | 0 |
| concepts[8].score | 0.1875239610671997 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[8].display_name | Mathematics |
| concepts[9].id | https://openalex.org/C127413603 |
| concepts[9].level | 0 |
| concepts[9].score | 0.1634126901626587 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[9].display_name | Engineering |
| concepts[10].id | https://openalex.org/C121332964 |
| concepts[10].level | 0 |
| concepts[10].score | 0.15385296940803528 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[10].display_name | Physics |
| concepts[11].id | https://openalex.org/C165801399 |
| concepts[11].level | 2 |
| concepts[11].score | 0.07381802797317505 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q25428 |
| concepts[11].display_name | Voltage |
| concepts[12].id | https://openalex.org/C97355855 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q11473 |
| concepts[12].display_name | Thermodynamics |
| keywords[0].id | https://openalex.org/keywords/autoregressive-model |
| keywords[0].score | 0.717806875705719 |
| keywords[0].display_name | Autoregressive model |
| keywords[1].id | https://openalex.org/keywords/transformer |
| keywords[1].score | 0.6032219529151917 |
| keywords[1].display_name | Transformer |
| keywords[2].id | https://openalex.org/keywords/speech-recognition |
| keywords[2].score | 0.48506084084510803 |
| keywords[2].display_name | Speech recognition |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.48174241185188293 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/diffusion |
| keywords[4].score | 0.48023825883865356 |
| keywords[4].display_name | Diffusion |
| keywords[5].id | https://openalex.org/keywords/natural-language-processing |
| keywords[5].score | 0.32293224334716797 |
| keywords[5].display_name | Natural language processing |
| keywords[6].id | https://openalex.org/keywords/electrical-engineering |
| keywords[6].score | 0.21377459168434143 |
| keywords[6].display_name | Electrical engineering |
| keywords[7].id | https://openalex.org/keywords/econometrics |
| keywords[7].score | 0.20260953903198242 |
| keywords[7].display_name | Econometrics |
| keywords[8].id | https://openalex.org/keywords/mathematics |
| keywords[8].score | 0.1875239610671997 |
| keywords[8].display_name | Mathematics |
| keywords[9].id | https://openalex.org/keywords/engineering |
| keywords[9].score | 0.1634126901626587 |
| keywords[9].display_name | Engineering |
| keywords[10].id | https://openalex.org/keywords/physics |
| keywords[10].score | 0.15385296940803528 |
| keywords[10].display_name | Physics |
| keywords[11].id | https://openalex.org/keywords/voltage |
| keywords[11].score | 0.07381802797317505 |
| keywords[11].display_name | Voltage |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2406.05551 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2406.05551 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2406.05551 |
| locations[1].id | doi:10.48550/arxiv.2406.05551 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2406.05551 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100425934 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-9386-7441 |
| authorships[0].author.display_name | Zhijun Liu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Liu, Zhijun |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100328312 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-7897-2024 |
| authorships[1].author.display_name | Shuai Wang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wang, Shuai |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5108413182 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Sho Inoue |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Inoue, Sho |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5065778847 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Qibing Bai |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Bai, Qibing |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5032690182 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-9158-9401 |
| authorships[4].author.display_name | Haizhou Li |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Li, Haizhou |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2406.05551 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Autoregressive Diffusion Transformer for Text-to-Speech Synthesis |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10201 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9939000010490417 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech Recognition and Synthesis |
| related_works | https://openalex.org/W2171218219, https://openalex.org/W1972271943, https://openalex.org/W2150410159, https://openalex.org/W4327525404, https://openalex.org/W4287185323, https://openalex.org/W3150905897, https://openalex.org/W1520183331, https://openalex.org/W2734842993, https://openalex.org/W2168175994, https://openalex.org/W2049473509 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2406.05551 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2406.05551 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2406.05551 |
| primary_location.id | pmh:oai:arXiv.org:2406.05551 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2406.05551 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2406.05551 |
| publication_date | 2024-06-08 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.. | 218 |
| abstract_inverted_index.a | 7, 31, 53, 91, 170 |
| abstract_inverted_index.To | 69 |
| abstract_inverted_index.as | 6, 77 |
| abstract_inverted_index.at | 146, 216 |
| abstract_inverted_index.be | 176 |
| abstract_inverted_index.in | 59, 63, 80, 102, 183, 210 |
| abstract_inverted_index.it | 159 |
| abstract_inverted_index.ms | 199 |
| abstract_inverted_index.of | 24, 55, 115, 155, 165, 193, 200 |
| abstract_inverted_index.on | 16 |
| abstract_inverted_index.or | 111 |
| abstract_inverted_index.to | 19, 50, 110, 129, 178 |
| abstract_inverted_index.we | 73 |
| abstract_inverted_index.Our | 96, 135 |
| abstract_inverted_index.and | 37, 85, 105 |
| abstract_inverted_index.are | 48, 214 |
| abstract_inverted_index.can | 175, 196 |
| abstract_inverted_index.for | 10, 144 |
| abstract_inverted_index.kHz | 202 |
| abstract_inverted_index.one | 184, 192 |
| abstract_inverted_index.our | 127, 194 |
| abstract_inverted_index.per | 204 |
| abstract_inverted_index.the | 56, 60, 152, 156, 161, 166 |
| abstract_inverted_index.$24$ | 201 |
| abstract_inverted_index.R^d$ | 84 |
| abstract_inverted_index.When | 40 |
| abstract_inverted_index.code | 35 |
| abstract_inverted_index.each | 147 |
| abstract_inverted_index.even | 112 |
| abstract_inverted_index.have | 3 |
| abstract_inverted_index.into | 22, 169 |
| abstract_inverted_index.only | 52 |
| abstract_inverted_index.step | 149, 206 |
| abstract_inverted_index.that | 99, 108, 114, 138 |
| abstract_inverted_index.turn | 64 |
| abstract_inverted_index.with | 42, 207 |
| abstract_inverted_index.$170$ | 198 |
| abstract_inverted_index.(IKL) | 142 |
| abstract_inverted_index.ARDiT | 100, 174 |
| abstract_inverted_index.Audio | 0, 27, 212 |
| abstract_inverted_index.audio | 12, 17, 44, 76 |
| abstract_inverted_index.model | 128, 168 |
| abstract_inverted_index.often | 29 |
| abstract_inverted_index.poses | 30 |
| abstract_inverted_index.space | 82 |
| abstract_inverted_index.step, | 185 |
| abstract_inverted_index.step. | 172 |
| abstract_inverted_index.their | 66 |
| abstract_inverted_index.these | 71, 88 |
| abstract_inverted_index.using | 90 |
| abstract_inverted_index.which | 62 |
| abstract_inverted_index.almost | 123 |
| abstract_inverted_index.audio, | 61 |
| abstract_inverted_index.boosts | 151 |
| abstract_inverted_index.codes, | 45 |
| abstract_inverted_index.during | 189 |
| abstract_inverted_index.encode | 20 |
| abstract_inverted_index.excels | 101 |
| abstract_inverted_index.models | 2, 47, 195 |
| abstract_inverted_index.nearly | 131 |
| abstract_inverted_index.reveal | 137 |
| abstract_inverted_index.single | 171 |
| abstract_inverted_index.speech | 120, 133, 203 |
| abstract_inverted_index.subset | 54 |
| abstract_inverted_index.tasks, | 14 |
| abstract_inverted_index.vector | 78 |
| abstract_inverted_index.achieve | 130 |
| abstract_inverted_index.between | 34 |
| abstract_inverted_index.bitrate | 36 |
| abstract_inverted_index.dealing | 41 |
| abstract_inverted_index.emerged | 5 |
| abstract_inverted_index.enables | 122 |
| abstract_inverted_index.issues, | 72 |
| abstract_inverted_index.latency | 188 |
| abstract_inverted_index.minimal | 208 |
| abstract_inverted_index.models. | 117 |
| abstract_inverted_index.perfect | 132 |
| abstract_inverted_index.predict | 179 |
| abstract_inverted_index.process | 51, 164 |
| abstract_inverted_index.propose | 74 |
| abstract_inverted_index.quality | 154 |
| abstract_inverted_index.relying | 15 |
| abstract_inverted_index.samples | 213 |
| abstract_inverted_index.several | 180 |
| abstract_inverted_index.trained | 177 |
| abstract_inverted_index.various | 11 |
| abstract_inverted_index.vectors | 182 |
| abstract_inverted_index.$\mathbb | 83 |
| abstract_inverted_index.(ARDiT). | 95 |
| abstract_inverted_index.Integral | 140 |
| abstract_inverted_index.allowing | 126 |
| abstract_inverted_index.approach | 9 |
| abstract_inverted_index.compares | 109 |
| abstract_inverted_index.discrete | 25 |
| abstract_inverted_index.editing. | 134 |
| abstract_inverted_index.embedded | 58 |
| abstract_inverted_index.encoding | 75 |
| abstract_inverted_index.exhibits | 106 |
| abstract_inverted_index.findings | 97 |
| abstract_inverted_index.flawless | 124 |
| abstract_inverted_index.generate | 197 |
| abstract_inverted_index.indicate | 98 |
| abstract_inverted_index.language | 1, 46 |
| abstract_inverted_index.recently | 4 |
| abstract_inverted_index.reducing | 187 |
| abstract_inverted_index.samples. | 157 |
| abstract_inverted_index.sampling | 163 |
| abstract_inverted_index.symbols. | 26 |
| abstract_inverted_index.accuracy. | 39 |
| abstract_inverted_index.available | 215 |
| abstract_inverted_index.condenses | 160 |
| abstract_inverted_index.diffusion | 93, 167 |
| abstract_inverted_index.employing | 139 |
| abstract_inverted_index.iterative | 162 |
| abstract_inverted_index.necessary | 32 |
| abstract_inverted_index.perceived | 153 |
| abstract_inverted_index.promising | 8 |
| abstract_inverted_index.restricts | 65 |
| abstract_inverted_index.sampling. | 190 |
| abstract_inverted_index.sequences | 23, 79, 89 |
| abstract_inverted_index.surpasses | 113 |
| abstract_inverted_index.waveforms | 21 |
| abstract_inverted_index.zero-shot | 103 |
| abstract_inverted_index.circumvent | 70 |
| abstract_inverted_index.compromise | 33 |
| abstract_inverted_index.continuous | 81, 119, 181 |
| abstract_inverted_index.divergence | 143 |
| abstract_inverted_index.evaluation | 205 |
| abstract_inverted_index.generating | 87 |
| abstract_inverted_index.generation | 13 |
| abstract_inverted_index.generative | 67 |
| abstract_inverted_index.tokenizers | 18 |
| abstract_inverted_index.constrained | 49 |
| abstract_inverted_index.degradation | 209 |
| abstract_inverted_index.experiments | 136 |
| abstract_inverted_index.information | 57 |
| abstract_inverted_index.low-bitrate | 43 |
| abstract_inverted_index.performance | 107 |
| abstract_inverted_index.transformer | 94 |
| abstract_inverted_index.Furthermore, | 173 |
| abstract_inverted_index.High-bitrate | 118 |
| abstract_inverted_index.decoder-only | 92 |
| abstract_inverted_index.distillation | 145 |
| abstract_inverted_index.performance. | 211 |
| abstract_inverted_index.tokenization | 28 |
| abstract_inverted_index.Impressively, | 191 |
| abstract_inverted_index.capabilities. | 68 |
| abstract_inverted_index.significantly | 150, 186 |
| abstract_inverted_index.autoregressive | 148 |
| abstract_inverted_index.reconstruction | 38 |
| abstract_inverted_index.representation | 121 |
| abstract_inverted_index.text-to-speech | 104 |
| abstract_inverted_index.Simultaneously, | 158 |
| abstract_inverted_index.reconstruction, | 125 |
| abstract_inverted_index.Kullback-Leibler | 141 |
| abstract_inverted_index.autoregressively | 86 |
| abstract_inverted_index.state-of-the-art | 116 |
| abstract_inverted_index.http://ardit-tts.github.io/ | 217 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |