Explanipedia

Turning Patients’ Open-Ended Narratives of Chronic Pain Into Quantitative Measures: Natural Language Processing Study Open

Raquel Norel, Jennifer S. Gewandter, Zhengwu Zhang, Anika Tahsin, Chadi G. Abdallah , et al. · 2025

Background Subjective report of pain remains the gold standard for assessing symptoms in patients with chronic pain and their response to analgesics. This subjectivity underscores the importance of understanding patients’ personal narrativ…

Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion Open

Yu Zhang, Biao Tian, Zhiyao Duan · 2025

Zero-shot online voice conversion (VC) holds significant promise for real-time communications and entertainment. However, current VC models struggle to preserve semantic fidelity under real-time constraints, deliver natural-sounding conver…

Investigating an Overfitting and Degeneration Phenomenon in Self-Supervised Multi-Pitch Estimation Open

Frank Cwitkowitz, Zhiyao Duan · 2025

Multi-Pitch Estimation (MPE) continues to be a sought after capability of Music Information Retrieval (MIR) systems, and is critical for many applications and downstream tasks involving pitch, including music transcription. However, existi…

A Review on Score-based Generative Models for Audio Applications Open

Ge Zhu, Yeye Wen, Zhiyao Duan · 2025

Diffusion models have emerged as powerful deep generative techniques, producing high-quality and diverse samples in applications in various domains including audio. These models have many different design choices suitable for different app…

PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing Open

You Zhang, Biao Tian, Lin Zhang, Zhiyao Duan · 2025

Neural speech editing enables seamless partial edits to speech utterances, allowing modifications to selected content while preserving the rest of the audio unchanged. This useful technique, however, also poses new risks of deepfakes. To e…

Twenty-Five Years of MIR Research: Achievements, Practices, Evaluations, and Future Challenges Open

Geoffroy Peeters, Zafar Rafii, Magdalena Fuentes, Zhiyao Duan, Emmanouil Benetos , et al. · 2025

Computer science

International audience

HARP 2.0: Expanding Hosted, Asynchronous, Remote Processing for Deep Learning in the DAW Open

Christodoulos Benetatos, Frank Cwitkowitz, Nathan Pruyne, Hugo Flores García, Patrick O’Reilly , et al. · 2025

HARP 2.0 brings deep learning models to digital audio workstation (DAW) software through hosted, asynchronous, remote processing, allowing users to route audio from a plug-in interface through any compatible Gradio endpoint to perform arbi…

Structural Design and Dynamic Analysis of a Deep Space Exploration Zoom Camera Open

Zhiyao Duan, Jianfeng Yang, Fu Li, Bin Xue, Junlong Yu , et al. · 2025

Computer science Geology

Space optical cameras serve as vital tools for solar observation, mostly employing fixed-focus systems to reduce moving parts and increase system stability. However, with increasing demands for observation, maintaining consistent image siz…

Audio Visual Segmentation Through Text Embeddings Open

Kyung‐Bok Lee, You Zhang, Zhiyao Duan · 2025

The goal of Audio-Visual Segmentation (AVS) is to localize and segment the sounding source objects from video frames. Research on AVS suffers from data scarcity due to the high cost of fine-grained manual annotations. Recent works attempt …

Measure by Measure: Measure-Based Automatic Music Composition with Modern Staff Notation Open

Yujia Yan, Zhiyao Duan · 2024

Computer science Mathematics Philosophy

This paper introduces a hierarchical framework for automatic composition of polyphonic music in Western modern staff notation. Central to our framework, a music score is represented as a grid of part‑wise measures, where each measure is en…

SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge Open

You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda , et al. · 2024

Computer science Environmental science Psychology

With the advancements in singing voice generation and the growing presence of AI singers on media platforms, the inaugural Singing Voice Deepfake Detection (SVDD) Challenge aims to advance research in identifying AI-generated singing voice…

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition Open

Samuele Cornell, Jordan Darefsky, Zhiyao Duan, Shinji Watanabe · 2024

Computer science

Currently, a common approach in many speech processing tasks is to leverage large scale pre-trained models by fine-tuning them on in-domain data for a particular application. Yet obtaining even a small amount of such data can be problemati…

A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection Open

Kyung‐Bok Lee, You Zhang, Zhiyao Duan · 2024

Computer science Philosophy

This paper addresses the challenge of developing a robust audio-visual deepfake detection model. In practical use cases, new generation algorithms are continually emerging, and these algorithms are not encountered during the development of…

GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis Open

Zehua Kcriss Li, Meiying Melissa Chen, Yi Zhong, Pinxin Liu, Zhiyao Duan · 2024

Computer science Psychology Philosophy

Expressive speech synthesis aims to generate speech that captures a wide range of para-linguistic features, including emotion and articulation, though current research primarily emphasizes emotional aspects over the nuanced articulatory fe…

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection Open

Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han , et al. · 2024

Computer science Biology Geography

Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensin…

SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan Open

You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Jionghao Han , et al. · 2024

Computer science Biology Physics

The rapid advancement of AI-generated singing voices, which now closely mimic natural human singing and align seamlessly with musical scores, has led to heightened concerns for artists and the music industry. Unlike spoken voice, singing v…

Scoring Time Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription Open

Yujia Yan, Zhiyao Duan · 2024

Computer science Physics Chemistry

The neural semi-Markov Conditional Random Field (semi-CRF) framework has demonstrated promise for event-based piano transcription. In this framework, all events (notes or pedals) are represented as closed time intervals tied to specific ev…

MusicHiFi: Fast High-Fidelity Stereo Vocoding Open

Ge Zhu, Juan-Pablo Cáceres, Zhiyao Duan, Nicholas J. Bryan · 2024

Computer science Engineering

Diffusion-based audio and music generation models commonly perform generation by constructing an image representation of audio (e.g., a mel-spectrogram) and then convert it to audio using a phase reconstruction model or vocoder. Typical vo…

Toward Fully Self-Supervised Multi-Pitch Estimation Open

Frank Cwitkowitz, Zhiyao Duan · 2024

Computer science Engineering

Multi-pitch estimation is a decades-long research problem involving the detection of pitch activity associated with concurrent musical events within multi-instrument mixtures. Supervised learning techniques have demonstrated solid performa…

Cacophony: An Improved Contrastive Audio-Text Model Open

Ge Zhu, Zhiyao Duan, Duan, Zhiyao · 2024

Computer science Philosophy

Despite recent advancements, audio-text models still lag behind their image-text counterparts in scale and performance. In this paper, we propose to improve both the data scale and the training procedure of audio-text contrastive models. S…

SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge (CtrSVDD Track, Training/Development Set) Open

You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Jionghao Han , et al. · 2024

Computer science Geography Physics

For more information about SVDD Challenge 2024, please refer to https://challenge.singfake.org/.We have released the training and development set here and other relevant scripts on GitHub (https://github.com/SVDDChallenge/SVDD_Utils). For …

SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge (CtrSVDD Track, Training/Development Set) Open

Yongyi Zang, You Zhang, Mojtaba Heydari, Zhiyao Duan, Han, Jionghao , et al. · 2024

Computer science Economics Biology

For more information about SVDD Challenge 2024, please refer to https://challenge.singfake.org/.We have released the training and development set here and other relevant scripts on GitHub (https://github.com/SVDDChallenge/SVDD_Utils). For …

BeatNet+: Real‑Time Rhythm Analysis for Diverse Music Audio Open

Mojtaba Heydari, Zhiyao Duan · 2024

Computer science Art

This paper presents a comprehensive study on real-time music rhythm analysis, covering joint beat and downbeat tracking for diverse kinds of music signals. We introduce BeatNet+, a two-stage approach to real-time rhythm analysis built on a…

Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech Open

Enting Zhou, You Zhang, Zhiyao Duan · 2023

Computer science Psychology Physics

Dimensional representations of speech emotions such as the arousal-valence (AV) representation provide a continuous and fine-grained description and control than their categorical counterparts. They have wide applications in tasks such as …

EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis Open

Ge Zhu, Yutong Wen, Marc-André Carbonneau, Zhiyao Duan · 2023

Computer science Physics Geography

Audio diffusion models can synthesize a wide variety of sounds. Existing models often operate on the latent domain with cascaded phase recovery modules to reconstruct waveform. This poses challenges when generating high-fidelity audio. In …

SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription Open

Yongyi Zang, Yi Zhong, Frank Cwitkowitz, Zhiyao Duan · 2023

Computer science Philosophy Economics

Guitar tablature is a form of music notation widely used among guitarists. It captures not only the musical content of a piece, but also its implementation and ornamentation on the instrument. Guitar Tablature Transcription (GTT) is an imp…

Mitigating Cross-Database Differences for Learning Unified HRTF Representation Open

Yutong Wen, You Zhang, Zhiyao Duan · 2023

Computer science Sociology Political science

Individualized head-related transfer functions (HRTFs) are crucial for accurate sound positioning in virtual auditory displays. As the acoustic measurement of HRTFs is resource-intensive, predicting individualized HRTFs using machine learn…

Zhiyao Duan YOU? Author Swipe