Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification Article Swipe
YOU?
·
· 2020
· Open Access
·
· DOI: https://doi.org/10.1250/ast.42.1
We propose non-parallel and many-to-many voice conversion (VC) using variational autoencoders (VAEs) that constructs VC models for converting arbitrary speakers' characteristics into those of other arbitrary speakers without parallel speech corpora for training the models. Although VAEs conditioned by one-hot coded speaker codes can achieve non-parallel VC, the phonetic contents of the converted speech tend to vanish, resulting in degraded speech quality. Another issue is that they cannot deal with unseen speakers not included in training corpora. To overcome these issues, we incorporate deep-neural-network-based automatic speech recognition (ASR) and automatic speaker verification (ASV) into the VAE-based VC. Since phonetic contents are given as phonetic posteriorgrams predicted from the ASR models, the proposed VC can overcome the quality degradation. Our VC utilizes d-vectors extracted from the ASV models as continuous speaker representations that can deal with unseen speakers. Experimental results demonstrate that our VC outperforms the conventional VAE-based VC in terms of mel-cepstral distortion and converted speech quality. We also investigate the effects of hyperparameters in our VC and reveal that 1) a large d-vector dimensionality that gives the better ASV performance does not necessarily improve converted speech quality, and 2) a large number of pre-stored speakers improves the quality.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.1250/ast.42.1
- https://www.jstage.jst.go.jp/article/ast/42/1/42_E1968/_pdf
- OA Status
- diamond
- Cited By
- 1
- References
- 38
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W3115500749
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W3115500749Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1250/ast.42.1Digital Object Identifier
- Title
-
Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verificationWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2020Year of publication
- Publication date
-
2020-12-31Full publication date if available
- Authors
-
Yuki Saito, Taiki Nakamura, Yusuke Ijima, Kyosuke Nishida, Shinnosuke TakamichiList of authors in order
- Landing page
-
https://doi.org/10.1250/ast.42.1Publisher landing page
- PDF URL
-
https://www.jstage.jst.go.jp/article/ast/42/1/42_E1968/_pdfDirect link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
diamondOpen access status per OpenAlex
- OA URL
-
https://www.jstage.jst.go.jp/article/ast/42/1/42_E1968/_pdfDirect OA link when available
- Concepts
-
Computer science, Speech recognition, Quality (philosophy), Artificial neural network, Distortion (music), Hyperparameter, Speaker recognition, Artificial intelligence, Pattern recognition (psychology), Philosophy, Amplifier, Computer network, Epistemology, Bandwidth (computing)Top concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2021: 1Per-year citation counts (last 5 years)
- References (count)
-
38Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W3115500749 |
|---|---|
| doi | https://doi.org/10.1250/ast.42.1 |
| ids.doi | https://doi.org/10.1250/ast.42.1 |
| ids.mag | 3115500749 |
| ids.openalex | https://openalex.org/W3115500749 |
| fwci | 0.14685955 |
| type | article |
| title | Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification |
| biblio.issue | 1 |
| biblio.volume | 42 |
| biblio.last_page | 11 |
| biblio.first_page | 1 |
| topics[0].id | https://openalex.org/T10201 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech Recognition and Synthesis |
| topics[1].id | https://openalex.org/T10860 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9986000061035156 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Speech and Audio Processing |
| topics[2].id | https://openalex.org/T11309 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9955999851226807 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1711 |
| topics[2].subfield.display_name | Signal Processing |
| topics[2].display_name | Music and Audio Processing |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7927168011665344 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C28490314 |
| concepts[1].level | 1 |
| concepts[1].score | 0.7765562534332275 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[1].display_name | Speech recognition |
| concepts[2].id | https://openalex.org/C2779530757 |
| concepts[2].level | 2 |
| concepts[2].score | 0.47524237632751465 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1207505 |
| concepts[2].display_name | Quality (philosophy) |
| concepts[3].id | https://openalex.org/C50644808 |
| concepts[3].level | 2 |
| concepts[3].score | 0.45331716537475586 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q192776 |
| concepts[3].display_name | Artificial neural network |
| concepts[4].id | https://openalex.org/C126780896 |
| concepts[4].level | 4 |
| concepts[4].score | 0.44991135597229004 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q899871 |
| concepts[4].display_name | Distortion (music) |
| concepts[5].id | https://openalex.org/C8642999 |
| concepts[5].level | 2 |
| concepts[5].score | 0.4164976477622986 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q4171168 |
| concepts[5].display_name | Hyperparameter |
| concepts[6].id | https://openalex.org/C133892786 |
| concepts[6].level | 2 |
| concepts[6].score | 0.41521352529525757 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q1145189 |
| concepts[6].display_name | Speaker recognition |
| concepts[7].id | https://openalex.org/C154945302 |
| concepts[7].level | 1 |
| concepts[7].score | 0.38176774978637695 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[7].display_name | Artificial intelligence |
| concepts[8].id | https://openalex.org/C153180895 |
| concepts[8].level | 2 |
| concepts[8].score | 0.33993738889694214 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[8].display_name | Pattern recognition (psychology) |
| concepts[9].id | https://openalex.org/C138885662 |
| concepts[9].level | 0 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[9].display_name | Philosophy |
| concepts[10].id | https://openalex.org/C194257627 |
| concepts[10].level | 3 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q211554 |
| concepts[10].display_name | Amplifier |
| concepts[11].id | https://openalex.org/C31258907 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q1301371 |
| concepts[11].display_name | Computer network |
| concepts[12].id | https://openalex.org/C111472728 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q9471 |
| concepts[12].display_name | Epistemology |
| concepts[13].id | https://openalex.org/C2776257435 |
| concepts[13].level | 2 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q1576430 |
| concepts[13].display_name | Bandwidth (computing) |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7927168011665344 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/speech-recognition |
| keywords[1].score | 0.7765562534332275 |
| keywords[1].display_name | Speech recognition |
| keywords[2].id | https://openalex.org/keywords/quality |
| keywords[2].score | 0.47524237632751465 |
| keywords[2].display_name | Quality (philosophy) |
| keywords[3].id | https://openalex.org/keywords/artificial-neural-network |
| keywords[3].score | 0.45331716537475586 |
| keywords[3].display_name | Artificial neural network |
| keywords[4].id | https://openalex.org/keywords/distortion |
| keywords[4].score | 0.44991135597229004 |
| keywords[4].display_name | Distortion (music) |
| keywords[5].id | https://openalex.org/keywords/hyperparameter |
| keywords[5].score | 0.4164976477622986 |
| keywords[5].display_name | Hyperparameter |
| keywords[6].id | https://openalex.org/keywords/speaker-recognition |
| keywords[6].score | 0.41521352529525757 |
| keywords[6].display_name | Speaker recognition |
| keywords[7].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[7].score | 0.38176774978637695 |
| keywords[7].display_name | Artificial intelligence |
| keywords[8].id | https://openalex.org/keywords/pattern-recognition |
| keywords[8].score | 0.33993738889694214 |
| keywords[8].display_name | Pattern recognition (psychology) |
| language | en |
| locations[0].id | doi:10.1250/ast.42.1 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S177137575 |
| locations[0].source.issn | 0369-4232, 1346-3969, 1347-5177, 2186-859X, 2432-2040 |
| locations[0].source.type | journal |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | 0369-4232 |
| locations[0].source.is_core | True |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Nippon Onkyo Gakkaishi/Acoustical science and technology/Nihon Onkyo Gakkaishi |
| locations[0].source.host_organization | https://openalex.org/P4327988903 |
| locations[0].source.host_organization_name | Acoustical Society of Japan |
| locations[0].source.host_organization_lineage | https://openalex.org/P4327988903 |
| locations[0].source.host_organization_lineage_names | Acoustical Society of Japan |
| locations[0].license | |
| locations[0].pdf_url | https://www.jstage.jst.go.jp/article/ast/42/1/42_E1968/_pdf |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Acoustical Science and Technology |
| locations[0].landing_page_url | https://doi.org/10.1250/ast.42.1 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5083394213 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-7967-2613 |
| authorships[0].author.display_name | Yuki Saito |
| authorships[0].countries | JP |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I74801974 |
| authorships[0].affiliations[0].raw_affiliation_string | Graduate School of Information Science and Technology, The University of Tokyo |
| authorships[0].institutions[0].id | https://openalex.org/I74801974 |
| authorships[0].institutions[0].ror | https://ror.org/057zh3y96 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I74801974 |
| authorships[0].institutions[0].country_code | JP |
| authorships[0].institutions[0].display_name | The University of Tokyo |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Yuki Saito |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Graduate School of Information Science and Technology, The University of Tokyo |
| authorships[1].author.id | https://openalex.org/A5104139092 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Taiki Nakamura |
| authorships[1].countries | JP |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I74801974 |
| authorships[1].affiliations[0].raw_affiliation_string | Faculty of Engineering, The University of Tokyo |
| authorships[1].institutions[0].id | https://openalex.org/I74801974 |
| authorships[1].institutions[0].ror | https://ror.org/057zh3y96 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I74801974 |
| authorships[1].institutions[0].country_code | JP |
| authorships[1].institutions[0].display_name | The University of Tokyo |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Taiki Nakamura |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Faculty of Engineering, The University of Tokyo |
| authorships[2].author.id | https://openalex.org/A5068604686 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Yusuke Ijima |
| authorships[2].countries | JP |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I2251713219 |
| authorships[2].affiliations[0].raw_affiliation_string | NTT Media Intelligence Laboratories, NTT Corporation |
| authorships[2].institutions[0].id | https://openalex.org/I2251713219 |
| authorships[2].institutions[0].ror | https://ror.org/00berct97 |
| authorships[2].institutions[0].type | company |
| authorships[2].institutions[0].lineage | https://openalex.org/I2251713219 |
| authorships[2].institutions[0].country_code | JP |
| authorships[2].institutions[0].display_name | NTT (Japan) |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Yusuke Ijima |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | NTT Media Intelligence Laboratories, NTT Corporation |
| authorships[3].author.id | https://openalex.org/A5110780218 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Kyosuke Nishida |
| authorships[3].countries | JP |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I2251713219 |
| authorships[3].affiliations[0].raw_affiliation_string | NTT Media Intelligence Laboratories, NTT Corporation |
| authorships[3].institutions[0].id | https://openalex.org/I2251713219 |
| authorships[3].institutions[0].ror | https://ror.org/00berct97 |
| authorships[3].institutions[0].type | company |
| authorships[3].institutions[0].lineage | https://openalex.org/I2251713219 |
| authorships[3].institutions[0].country_code | JP |
| authorships[3].institutions[0].display_name | NTT (Japan) |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Kyosuke Nishida |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | NTT Media Intelligence Laboratories, NTT Corporation |
| authorships[4].author.id | https://openalex.org/A5013050263 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-0520-7847 |
| authorships[4].author.display_name | Shinnosuke Takamichi |
| authorships[4].countries | JP |
| authorships[4].affiliations[0].institution_ids | https://openalex.org/I74801974 |
| authorships[4].affiliations[0].raw_affiliation_string | Graduate School of Information Science and Technology, The University of Tokyo |
| authorships[4].institutions[0].id | https://openalex.org/I74801974 |
| authorships[4].institutions[0].ror | https://ror.org/057zh3y96 |
| authorships[4].institutions[0].type | education |
| authorships[4].institutions[0].lineage | https://openalex.org/I74801974 |
| authorships[4].institutions[0].country_code | JP |
| authorships[4].institutions[0].display_name | The University of Tokyo |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Shinnosuke Takamichi |
| authorships[4].is_corresponding | False |
| authorships[4].raw_affiliation_strings | Graduate School of Information Science and Technology, The University of Tokyo |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://www.jstage.jst.go.jp/article/ast/42/1/42_E1968/_pdf |
| open_access.oa_status | diamond |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10201 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech Recognition and Synthesis |
| related_works | https://openalex.org/W1491159402, https://openalex.org/W4297807400, https://openalex.org/W4313854686, https://openalex.org/W2499802997, https://openalex.org/W3162054169, https://openalex.org/W1813780412, https://openalex.org/W289407349, https://openalex.org/W2029134149, https://openalex.org/W2368768466, https://openalex.org/W2757081366 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2021 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1250/ast.42.1 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S177137575 |
| best_oa_location.source.issn | 0369-4232, 1346-3969, 1347-5177, 2186-859X, 2432-2040 |
| best_oa_location.source.type | journal |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | 0369-4232 |
| best_oa_location.source.is_core | True |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Nippon Onkyo Gakkaishi/Acoustical science and technology/Nihon Onkyo Gakkaishi |
| best_oa_location.source.host_organization | https://openalex.org/P4327988903 |
| best_oa_location.source.host_organization_name | Acoustical Society of Japan |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4327988903 |
| best_oa_location.source.host_organization_lineage_names | Acoustical Society of Japan |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://www.jstage.jst.go.jp/article/ast/42/1/42_E1968/_pdf |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Acoustical Science and Technology |
| best_oa_location.landing_page_url | https://doi.org/10.1250/ast.42.1 |
| primary_location.id | doi:10.1250/ast.42.1 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S177137575 |
| primary_location.source.issn | 0369-4232, 1346-3969, 1347-5177, 2186-859X, 2432-2040 |
| primary_location.source.type | journal |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | 0369-4232 |
| primary_location.source.is_core | True |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Nippon Onkyo Gakkaishi/Acoustical science and technology/Nihon Onkyo Gakkaishi |
| primary_location.source.host_organization | https://openalex.org/P4327988903 |
| primary_location.source.host_organization_name | Acoustical Society of Japan |
| primary_location.source.host_organization_lineage | https://openalex.org/P4327988903 |
| primary_location.source.host_organization_lineage_names | Acoustical Society of Japan |
| primary_location.license | |
| primary_location.pdf_url | https://www.jstage.jst.go.jp/article/ast/42/1/42_E1968/_pdf |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Acoustical Science and Technology |
| primary_location.landing_page_url | https://doi.org/10.1250/ast.42.1 |
| publication_date | 2020-12-31 |
| publication_year | 2020 |
| referenced_works | https://openalex.org/W1991682319, https://openalex.org/W2293635543, https://openalex.org/W2142300631, https://openalex.org/W2156142001, https://openalex.org/W2126143605, https://openalex.org/W2120605154, https://openalex.org/W2046056978, https://openalex.org/W2963971656, https://openalex.org/W2964069186, https://openalex.org/W2787685498, https://openalex.org/W2963223306, https://openalex.org/W2518172956, https://openalex.org/W2804998325, https://openalex.org/W2759925408, https://openalex.org/W1999405202, https://openalex.org/W2794725088, https://openalex.org/W2963403664, https://openalex.org/W2114925438, https://openalex.org/W2963808252, https://openalex.org/W2963336460, https://openalex.org/W1592062602, https://openalex.org/W2951298482, https://openalex.org/W2160815625, https://openalex.org/W2210838531, https://openalex.org/W2154920538, https://openalex.org/W2951004968, https://openalex.org/W2146502635, https://openalex.org/W2949416428, https://openalex.org/W2049686551, https://openalex.org/W2734531544, https://openalex.org/W1482298176, https://openalex.org/W2902070858, https://openalex.org/W2556467266, https://openalex.org/W2516608830, https://openalex.org/W2795654873, https://openalex.org/W2099471712, https://openalex.org/W2605287558, https://openalex.org/W2156387975 |
| referenced_works_count | 38 |
| abstract_inverted_index.a | 171, 190 |
| abstract_inverted_index.1) | 170 |
| abstract_inverted_index.2) | 189 |
| abstract_inverted_index.To | 77 |
| abstract_inverted_index.VC | 14, 112, 119, 142, 147, 166 |
| abstract_inverted_index.We | 0, 157 |
| abstract_inverted_index.as | 102, 127 |
| abstract_inverted_index.by | 38 |
| abstract_inverted_index.in | 58, 74, 148, 164 |
| abstract_inverted_index.is | 64 |
| abstract_inverted_index.of | 23, 50, 150, 162, 193 |
| abstract_inverted_index.to | 55 |
| abstract_inverted_index.we | 81 |
| abstract_inverted_index.ASR | 108 |
| abstract_inverted_index.ASV | 125, 179 |
| abstract_inverted_index.Our | 118 |
| abstract_inverted_index.VC, | 46 |
| abstract_inverted_index.VC. | 96 |
| abstract_inverted_index.and | 3, 88, 153, 167, 188 |
| abstract_inverted_index.are | 100 |
| abstract_inverted_index.can | 43, 113, 132 |
| abstract_inverted_index.for | 16, 31 |
| abstract_inverted_index.not | 72, 182 |
| abstract_inverted_index.our | 141, 165 |
| abstract_inverted_index.the | 33, 47, 51, 94, 107, 110, 115, 124, 144, 160, 177, 197 |
| abstract_inverted_index.(VC) | 7 |
| abstract_inverted_index.VAEs | 36 |
| abstract_inverted_index.also | 158 |
| abstract_inverted_index.deal | 68, 133 |
| abstract_inverted_index.does | 181 |
| abstract_inverted_index.from | 106, 123 |
| abstract_inverted_index.into | 21, 93 |
| abstract_inverted_index.tend | 54 |
| abstract_inverted_index.that | 12, 65, 131, 140, 169, 175 |
| abstract_inverted_index.they | 66 |
| abstract_inverted_index.with | 69, 134 |
| abstract_inverted_index.(ASR) | 87 |
| abstract_inverted_index.(ASV) | 92 |
| abstract_inverted_index.Since | 97 |
| abstract_inverted_index.coded | 40 |
| abstract_inverted_index.codes | 42 |
| abstract_inverted_index.given | 101 |
| abstract_inverted_index.gives | 176 |
| abstract_inverted_index.issue | 63 |
| abstract_inverted_index.large | 172, 191 |
| abstract_inverted_index.other | 24 |
| abstract_inverted_index.terms | 149 |
| abstract_inverted_index.these | 79 |
| abstract_inverted_index.those | 22 |
| abstract_inverted_index.using | 8 |
| abstract_inverted_index.voice | 5 |
| abstract_inverted_index.(VAEs) | 11 |
| abstract_inverted_index.better | 178 |
| abstract_inverted_index.cannot | 67 |
| abstract_inverted_index.models | 15, 126 |
| abstract_inverted_index.number | 192 |
| abstract_inverted_index.reveal | 168 |
| abstract_inverted_index.speech | 29, 53, 60, 85, 155, 186 |
| abstract_inverted_index.unseen | 70, 135 |
| abstract_inverted_index.Another | 62 |
| abstract_inverted_index.achieve | 44 |
| abstract_inverted_index.corpora | 30 |
| abstract_inverted_index.effects | 161 |
| abstract_inverted_index.improve | 184 |
| abstract_inverted_index.issues, | 80 |
| abstract_inverted_index.models, | 109 |
| abstract_inverted_index.models. | 34 |
| abstract_inverted_index.one-hot | 39 |
| abstract_inverted_index.propose | 1 |
| abstract_inverted_index.quality | 116 |
| abstract_inverted_index.results | 138 |
| abstract_inverted_index.speaker | 41, 90, 129 |
| abstract_inverted_index.vanish, | 56 |
| abstract_inverted_index.without | 27 |
| abstract_inverted_index.Although | 35 |
| abstract_inverted_index.contents | 49, 99 |
| abstract_inverted_index.corpora. | 76 |
| abstract_inverted_index.d-vector | 173 |
| abstract_inverted_index.degraded | 59 |
| abstract_inverted_index.improves | 196 |
| abstract_inverted_index.included | 73 |
| abstract_inverted_index.overcome | 78, 114 |
| abstract_inverted_index.parallel | 28 |
| abstract_inverted_index.phonetic | 48, 98, 103 |
| abstract_inverted_index.proposed | 111 |
| abstract_inverted_index.quality, | 187 |
| abstract_inverted_index.quality. | 61, 156, 198 |
| abstract_inverted_index.speakers | 26, 71, 195 |
| abstract_inverted_index.training | 32, 75 |
| abstract_inverted_index.utilizes | 120 |
| abstract_inverted_index.VAE-based | 95, 146 |
| abstract_inverted_index.arbitrary | 18, 25 |
| abstract_inverted_index.automatic | 84, 89 |
| abstract_inverted_index.converted | 52, 154, 185 |
| abstract_inverted_index.d-vectors | 121 |
| abstract_inverted_index.extracted | 122 |
| abstract_inverted_index.predicted | 105 |
| abstract_inverted_index.resulting | 57 |
| abstract_inverted_index.speakers' | 19 |
| abstract_inverted_index.speakers. | 136 |
| abstract_inverted_index.constructs | 13 |
| abstract_inverted_index.continuous | 128 |
| abstract_inverted_index.conversion | 6 |
| abstract_inverted_index.converting | 17 |
| abstract_inverted_index.distortion | 152 |
| abstract_inverted_index.pre-stored | 194 |
| abstract_inverted_index.conditioned | 37 |
| abstract_inverted_index.demonstrate | 139 |
| abstract_inverted_index.incorporate | 82 |
| abstract_inverted_index.investigate | 159 |
| abstract_inverted_index.necessarily | 183 |
| abstract_inverted_index.outperforms | 143 |
| abstract_inverted_index.performance | 180 |
| abstract_inverted_index.recognition | 86 |
| abstract_inverted_index.variational | 9 |
| abstract_inverted_index.Experimental | 137 |
| abstract_inverted_index.autoencoders | 10 |
| abstract_inverted_index.conventional | 145 |
| abstract_inverted_index.degradation. | 117 |
| abstract_inverted_index.many-to-many | 4 |
| abstract_inverted_index.mel-cepstral | 151 |
| abstract_inverted_index.non-parallel | 2, 45 |
| abstract_inverted_index.verification | 91 |
| abstract_inverted_index.dimensionality | 174 |
| abstract_inverted_index.posteriorgrams | 104 |
| abstract_inverted_index.characteristics | 20 |
| abstract_inverted_index.hyperparameters | 163 |
| abstract_inverted_index.representations | 130 |
| abstract_inverted_index.deep-neural-network-based | 83 |
| cited_by_percentile_year.max | 93 |
| cited_by_percentile_year.min | 89 |
| countries_distinct_count | 1 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile.value | 0.57944566 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |