Exploring foci of:
arXiv (Cornell University)
LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
June 2023 • Zhichao Wang, Yuanzhe Chen, Lei Xie, Qiao Tian, Yu‐Ping Wang
Language model (LM) based audio generation frameworks, e.g., AudioLM, have recently achieved new state-of-the-art performance in zero-shot audio generation. In this paper, we explore the feasibility of LMs for zero-shot voice conversion. An intuitive approach is to follow AudioLM - Tokenizing speech into semantic and acoustic tokens respectively by HuBERT and SoundStream, and converting source semantic tokens to target acoustic tokens conditioned on acoustic tokens of the target speaker. However, such an approach …
Computer Science
Artificial Intelligence
Paleontology
Biology
Philosophy