Chaoren Wang
YOU?
Author Swipe
View article: Vevo2: A Unified and Controllable Framework for Speech and Singing Voice Generation
Vevo2: A Unified and Controllable Framework for Speech and Singing Voice Generation Open
Controllable human voice generation, particularly for expressive domains like singing, remains a significant challenge. This paper introduces Vevo2, a unified framework for controllable speech and singing voice generation. To tackle issues…
View article: Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN
Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN Open
Pitch manipulation is the process of producers adjusting the pitch of an audio segment to a specific key and intonation, which is essential in music production. Neural-network-based pitch-manipulation systems have been popular in recent ye…
View article: SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset
SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset Open
The lack of a publicly-available large-scale and diverse dataset has long been a significant bottleneck for singing voice applications like Singing Voice Synthesis (SVS) and Singing Voice Conversion (SVC). To tackle this problem, we presen…
View article: Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment Open
Modern zero-shot text-to-speech (TTS) systems, despite using extensive pre-training, often struggle in challenging scenarios such as tongue twisters, repeated words, code-switching, and cross-lingual synthesis, leading to intelligibility i…
View article: Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation Open
Recent advancements in speech generation have been driven by large-scale training datasets. However, current models struggle to capture the spontaneity and variability inherent in real-world human speech, as they are primarily trained on a…
View article: Overview of the Amphion Toolkit (v0.2)
Overview of the Amphion Toolkit (v0.2) Open
Amphion is an open-source toolkit for Audio, Music, and Speech Generation, designed to lower the entry barrier for junior researchers and engineers in these fields. It provides a versatile framework that supports a variety of generation ta…
View article: Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation Open
Recent advancements in speech generation models have been significantly driven by the use of large-scale training data. However, producing highly spontaneous, human-like speech remains a challenge due to the scarcity of large, diverse, and…
View article: SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion
SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion Open
In this study, we present SingVisio, an interactive visual analysis system that aims to explain the diffusion model used in singing voice conversion. SingVisio provides a visual display of the generation process in diffusion models, showca…
View article: Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit Open
Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that includes diverse generation tasks and models,…
View article: The near surface vertical atmospheric electric field abnormality could be as a promising imminent precursor of major earthquakes
The near surface vertical atmospheric electric field abnormality could be as a promising imminent precursor of major earthquakes Open
A promising short term precursor of major earthquakes (EQ) is very crucial in saving people and preventing huge losses. Ez, atmospheric electrostatic field vertical component, under fair air conditions, is generally oriented downwards (pos…