LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models

Exploring foci of: arXiv (Cornell University) LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models June 2023 • Zhichao Wang, Yuanzhe Chen, Lei Xie, Qiao Tian, Yu‐Ping Wang Language model (LM) based audio generation frameworks, e.g., AudioLM, have recently achieved new state-of-the-art performance in zero-shot audio generation. In this paper, we explore the feasibility of LMs for zero-shot voice conversion. An intuitive approach is to follow AudioLM - Tokenizing speech into semantic and acoustic tokens respectively by HuBERT and SoundStream, and converting source semantic tokens to target acoustic tokens conditioned on acoustic tokens of the target speaker. However, such an approach … Open Article Page

Computer Science Artificial Intelligence Paleontology Biology Philosophy Open Article