Explanipedia

Vevo2: A Unified and Controllable Framework for Speech and Singing Voice Generation Open

Xueyao Zhang, J. S. Zhang, Yuancheng Wang, Chaoren Wang, Yuanzhe Chen , et al. · 2025

Controllable human voice generation, particularly for expressive domains like singing, remains a significant challenge. This paper introduces Vevo2, a unified framework for controllable speech and singing voice generation. To tackle issues…

Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN Open

Yicheng Gu, Chaoren Wang, Zhizheng Wu, Lauri Juvela · 2025

Pitch manipulation is the process of producers adjusting the pitch of an audio segment to a specific key and intonation, which is essential in music production. Neural-network-based pitch-manipulation systems have been popular in recent ye…

SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset Open

Yicheng Gu, Chaoren Wang, J. S. Zhang, Xueyao Zhang, Zihao Fang , et al. · 2025

The lack of a publicly-available large-scale and diverse dataset has long been a significant bottleneck for singing voice applications like Singing Voice Synthesis (SVS) and Singing Voice Conversion (SVC). To tackle this problem, we presen…

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment Open

Xueyao Zhang, Yuancheng Wang, Chaoren Wang, Ziniu Li, Zhuo Chen , et al. · 2025

Modern zero-shot text-to-speech (TTS) systems, despite using extensive pre-training, often struggle in challenging scenarios such as tongue twisters, repeated words, code-switching, and cross-lingual synthesis, leading to intelligibility i…

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation Open

Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu , et al. · 2025

Computer science Geography

Recent advancements in speech generation have been driven by large-scale training datasets. However, current models struggle to capture the spontaneity and variability inherent in real-world human speech, as they are primarily trained on a…

Overview of the Amphion Toolkit (v0.2) Open

Jiaqi Li, Xueyao Zhang, Yuancheng Wang, Haorui He, Chaoren Wang , et al. · 2025

Computer science

Amphion is an open-source toolkit for Audio, Music, and Speech Generation, designed to lower the entry barrier for junior researchers and engineers in these fields. It provides a versatile framework that supports a variety of generation ta…

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation Open

Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu , et al. · 2024

Computer science Geography

Recent advancements in speech generation models have been significantly driven by the use of large-scale training data. However, producing highly spontaneous, human-like speech remains a challenge due to the scarcity of large, diverse, and…

SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion Open

Liumeng Xue, Chaoren Wang, Mingxuan Wang, Xueyao Zhang, Jun Han , et al. · 2024

Computer science Physics

In this study, we present SingVisio, an interactive visual analysis system that aims to explain the diffusion model used in singing voice conversion. SingVisio provides a visual display of the generation process in diffusion models, showca…

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit Open

Xueyao Zhang, Liumeng Xue, Yuancheng Wang, Yicheng Gu, Xi Chen , et al. · 2023

Computer science Engineering Economics

Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that includes diverse generation tasks and models,…

The near surface vertical atmospheric electric field abnormality could be as a promising imminent precursor of major earthquakes Open

T. Chen, Han Wu, X.-X. Zhang, Chaoren Wang, Xu Jin , et al. · 2020

Geology Environmental science Medicine

A promising short term precursor of major earthquakes (EQ) is very crucial in saving people and preventing huge losses. Ez, atmospheric electrostatic field vertical component, under fair air conditions, is generally oriented downwards (pos…

Chaoren Wang YOU? Author Swipe