Low-latency Real-time Voice Conversion on CPU Article Swipe
Konstantine Sadov
,
Matthew M. Hutter
,
Asara Near
·
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2311.00873
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2311.00873
We adapt the architectures of previous audio manipulation and generation neural networks to the task of real-time any-to-one voice conversion. Our resulting model, LLVC ($\textbf{L}$ow-latency $\textbf{L}$ow-resource $\textbf{V}$oice $\textbf{C}$onversion), has a latency of under 20ms at a bitrate of 16kHz and runs nearly 2.8x faster than real-time on a consumer CPU. LLVC uses both a generative adversarial architecture as well as knowledge distillation in order to attain this performance. To our knowledge LLVC achieves both the lowest resource usage as well as the lowest latency of any open-source voice conversion model. We provide open-source samples, code, and pretrained model weights at https://github.com/KoeAI/LLVC.
Related Topics
Concepts
Metadata
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2311.00873
- https://arxiv.org/pdf/2311.00873
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4388328812
All OpenAlex metadata
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4388328812Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2311.00873Digital Object Identifier
- Title
-
Low-latency Real-time Voice Conversion on CPUWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-11-01Full publication date if available
- Authors
-
Konstantine Sadov, Matthew M. Hutter, Asara NearList of authors in order
- Landing page
-
https://arxiv.org/abs/2311.00873Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2311.00873Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2311.00873Direct OA link when available
- Concepts
-
Latency (audio), Computer science, Source code, Parallel computing, Open source, Code (set theory), Real-time computing, Speech recognition, Operating system, Software, Programming language, Telecommunications, Set (abstract data type)Top concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4388328812 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2311.00873 |
| ids.doi | https://doi.org/10.48550/arxiv.2311.00873 |
| ids.openalex | https://openalex.org/W4388328812 |
| fwci | |
| type | preprint |
| title | Low-latency Real-time Voice Conversion on CPU |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10201 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9991999864578247 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech Recognition and Synthesis |
| topics[1].id | https://openalex.org/T11309 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9972000122070312 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Music and Audio Processing |
| topics[2].id | https://openalex.org/T10860 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.996399998664856 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1711 |
| topics[2].subfield.display_name | Signal Processing |
| topics[2].display_name | Speech and Audio Processing |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C82876162 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8151229619979858 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q17096504 |
| concepts[0].display_name | Latency (audio) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7512373924255371 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C43126263 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5298150777816772 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q128751 |
| concepts[2].display_name | Source code |
| concepts[3].id | https://openalex.org/C173608175 |
| concepts[3].level | 1 |
| concepts[3].score | 0.45229801535606384 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q232661 |
| concepts[3].display_name | Parallel computing |
| concepts[4].id | https://openalex.org/C3018397939 |
| concepts[4].level | 3 |
| concepts[4].score | 0.42438310384750366 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q3644502 |
| concepts[4].display_name | Open source |
| concepts[5].id | https://openalex.org/C2776760102 |
| concepts[5].level | 3 |
| concepts[5].score | 0.42204582691192627 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q5139990 |
| concepts[5].display_name | Code (set theory) |
| concepts[6].id | https://openalex.org/C79403827 |
| concepts[6].level | 1 |
| concepts[6].score | 0.349020779132843 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q3988 |
| concepts[6].display_name | Real-time computing |
| concepts[7].id | https://openalex.org/C28490314 |
| concepts[7].level | 1 |
| concepts[7].score | 0.3444843888282776 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[7].display_name | Speech recognition |
| concepts[8].id | https://openalex.org/C111919701 |
| concepts[8].level | 1 |
| concepts[8].score | 0.1910364329814911 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[8].display_name | Operating system |
| concepts[9].id | https://openalex.org/C2777904410 |
| concepts[9].level | 2 |
| concepts[9].score | 0.1523243486881256 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q7397 |
| concepts[9].display_name | Software |
| concepts[10].id | https://openalex.org/C199360897 |
| concepts[10].level | 1 |
| concepts[10].score | 0.1117071807384491 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[10].display_name | Programming language |
| concepts[11].id | https://openalex.org/C76155785 |
| concepts[11].level | 1 |
| concepts[11].score | 0.07947716116905212 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q418 |
| concepts[11].display_name | Telecommunications |
| concepts[12].id | https://openalex.org/C177264268 |
| concepts[12].level | 2 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[12].display_name | Set (abstract data type) |
| keywords[0].id | https://openalex.org/keywords/latency |
| keywords[0].score | 0.8151229619979858 |
| keywords[0].display_name | Latency (audio) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.7512373924255371 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/source-code |
| keywords[2].score | 0.5298150777816772 |
| keywords[2].display_name | Source code |
| keywords[3].id | https://openalex.org/keywords/parallel-computing |
| keywords[3].score | 0.45229801535606384 |
| keywords[3].display_name | Parallel computing |
| keywords[4].id | https://openalex.org/keywords/open-source |
| keywords[4].score | 0.42438310384750366 |
| keywords[4].display_name | Open source |
| keywords[5].id | https://openalex.org/keywords/code |
| keywords[5].score | 0.42204582691192627 |
| keywords[5].display_name | Code (set theory) |
| keywords[6].id | https://openalex.org/keywords/real-time-computing |
| keywords[6].score | 0.349020779132843 |
| keywords[6].display_name | Real-time computing |
| keywords[7].id | https://openalex.org/keywords/speech-recognition |
| keywords[7].score | 0.3444843888282776 |
| keywords[7].display_name | Speech recognition |
| keywords[8].id | https://openalex.org/keywords/operating-system |
| keywords[8].score | 0.1910364329814911 |
| keywords[8].display_name | Operating system |
| keywords[9].id | https://openalex.org/keywords/software |
| keywords[9].score | 0.1523243486881256 |
| keywords[9].display_name | Software |
| keywords[10].id | https://openalex.org/keywords/programming-language |
| keywords[10].score | 0.1117071807384491 |
| keywords[10].display_name | Programming language |
| keywords[11].id | https://openalex.org/keywords/telecommunications |
| keywords[11].score | 0.07947716116905212 |
| keywords[11].display_name | Telecommunications |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2311.00873 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2311.00873 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2311.00873 |
| locations[1].id | doi:10.48550/arxiv.2311.00873 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2311.00873 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5093192259 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Konstantine Sadov |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Sadov, Konstantine |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5110406325 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Matthew M. Hutter |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Hutter, Matthew |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5093192260 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Asara Near |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Near, Asara |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2311.00873 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Low-latency Real-time Voice Conversion on CPU |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10201 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9991999864578247 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech Recognition and Synthesis |
| related_works | https://openalex.org/W2124842464, https://openalex.org/W2056396287, https://openalex.org/W2113128227, https://openalex.org/W632256878, https://openalex.org/W2491403535, https://openalex.org/W3081644756, https://openalex.org/W2479811461, https://openalex.org/W2104915799, https://openalex.org/W4311938462, https://openalex.org/W2355429491 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2311.00873 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2311.00873 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2311.00873 |
| primary_location.id | pmh:oai:arXiv.org:2311.00873 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2311.00873 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2311.00873 |
| publication_date | 2023-11-01 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 29, 35, 47, 53 |
| abstract_inverted_index.To | 68 |
| abstract_inverted_index.We | 0, 90 |
| abstract_inverted_index.as | 57, 59, 78, 80 |
| abstract_inverted_index.at | 34, 99 |
| abstract_inverted_index.in | 62 |
| abstract_inverted_index.of | 4, 15, 31, 37, 84 |
| abstract_inverted_index.on | 46 |
| abstract_inverted_index.to | 12, 64 |
| abstract_inverted_index.Our | 20 |
| abstract_inverted_index.and | 8, 39, 95 |
| abstract_inverted_index.any | 85 |
| abstract_inverted_index.has | 28 |
| abstract_inverted_index.our | 69 |
| abstract_inverted_index.the | 2, 13, 74, 81 |
| abstract_inverted_index.2.8x | 42 |
| abstract_inverted_index.20ms | 33 |
| abstract_inverted_index.CPU. | 49 |
| abstract_inverted_index.LLVC | 23, 50, 71 |
| abstract_inverted_index.both | 52, 73 |
| abstract_inverted_index.runs | 40 |
| abstract_inverted_index.task | 14 |
| abstract_inverted_index.than | 44 |
| abstract_inverted_index.this | 66 |
| abstract_inverted_index.uses | 51 |
| abstract_inverted_index.well | 58, 79 |
| abstract_inverted_index.16kHz | 38 |
| abstract_inverted_index.adapt | 1 |
| abstract_inverted_index.audio | 6 |
| abstract_inverted_index.code, | 94 |
| abstract_inverted_index.model | 97 |
| abstract_inverted_index.order | 63 |
| abstract_inverted_index.under | 32 |
| abstract_inverted_index.usage | 77 |
| abstract_inverted_index.voice | 18, 87 |
| abstract_inverted_index.attain | 65 |
| abstract_inverted_index.faster | 43 |
| abstract_inverted_index.lowest | 75, 82 |
| abstract_inverted_index.model, | 22 |
| abstract_inverted_index.model. | 89 |
| abstract_inverted_index.nearly | 41 |
| abstract_inverted_index.neural | 10 |
| abstract_inverted_index.bitrate | 36 |
| abstract_inverted_index.latency | 30, 83 |
| abstract_inverted_index.provide | 91 |
| abstract_inverted_index.weights | 98 |
| abstract_inverted_index.achieves | 72 |
| abstract_inverted_index.consumer | 48 |
| abstract_inverted_index.networks | 11 |
| abstract_inverted_index.previous | 5 |
| abstract_inverted_index.resource | 76 |
| abstract_inverted_index.samples, | 93 |
| abstract_inverted_index.knowledge | 60, 70 |
| abstract_inverted_index.real-time | 16, 45 |
| abstract_inverted_index.resulting | 21 |
| abstract_inverted_index.any-to-one | 17 |
| abstract_inverted_index.conversion | 88 |
| abstract_inverted_index.generation | 9 |
| abstract_inverted_index.generative | 54 |
| abstract_inverted_index.pretrained | 96 |
| abstract_inverted_index.adversarial | 55 |
| abstract_inverted_index.conversion. | 19 |
| abstract_inverted_index.open-source | 86, 92 |
| abstract_inverted_index.architecture | 56 |
| abstract_inverted_index.distillation | 61 |
| abstract_inverted_index.manipulation | 7 |
| abstract_inverted_index.performance. | 67 |
| abstract_inverted_index.architectures | 3 |
| abstract_inverted_index.$\textbf{V}$oice | 26 |
| abstract_inverted_index.$\textbf{C}$onversion), | 27 |
| abstract_inverted_index.$\textbf{L}$ow-resource | 25 |
| abstract_inverted_index.($\textbf{L}$ow-latency | 24 |
| abstract_inverted_index.https://github.com/KoeAI/LLVC. | 100 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |