Explanipedia

Digital App for Speech and Health Monitoring Study (DASH): protocol for a prospective longitudinal case–control observational study for developing speech datasets in neurodegenerative disorders and dementia Open

Johnny Tam, Christine Weaver, Amarachi Ihenacho, Judith Newton, Bruce B. Virgo , et al. · 2025

Introduction Neurodegenerative disorders (NDDs) represent an unprecedented public health burden. These disorders are clinically heterogeneous and therapeutically challenging, but advances in discovery science and trial methodology offer ho…

Comparator Loss: An Ordinal Contrastive Loss to Derive a Severity Score for Speech-based Health Monitoring Open

Jacob J Webber, Oliver Watts, Lovisa Wihlborg, David Wheatley, Johnny Tam , et al. · 2025

Monitoring the progression of neurodegenerative disease has important applications in the planning of treatment and the evaluation of future medications. Whereas much of the state-of-the-art in health monitoring from speech has been focuse…

Voice Conversion-based Privacy through Adversarial Information Hiding Open

Jacob J Webber, Oliver Watts, Gustav Eje Henter, Jennifer Williams, Simon King · 2024

Computer science Business

Privacy-preserving voice conversion aims to remove only the attributes of speech audio that convey identity information, keeping other speech characteristics intact. This paper presents a mechanism for privacy-preserving voice conversion t…

Performance of data-driven inner speech decoding with same-task EEG-fMRI data fusion and bimodal models Open

Holly Wilson, Scott Wellington, Foteini Liwicki, Vibha Gupta, Rajkumar Saini , et al. · 2023

Computer science Psychology Mathematics

Decoding inner speech from the brain signal via hybridisation of fMRI and EEG data is explored to investigate the performance benefits over unimodal models. Two different bimodal fusion approaches are examined: concatenation of probability…

PUFFIN: Pitch-Synchronous Neural Waveform Generation for Fullband Speech on Modest Devices Open

Oliver Watts, Lovisa Wihlborg, Cassia Valentini-Botinhao · 2023

Computer science

We present a neural vocoder designed with low-powered Alternative and Augmentative Communication devices in mind. By combining elements of successful modern vocoders with established ideas from an older generation of technology, our system…

Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices Open

Oliver Watts, Lovisa Wihlborg, Cassia Valentini-Botinhao · 2022

Computer science Chemistry

We present a neural vocoder designed with low-powered Alternative and Augmentative Communication devices in mind. By combining elements of successful modern vocoders with established ideas from an older generation of technology, our system…

Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks Open

Cassia Valentini-Botinhao, Manuel Sam Ribeiro, Oliver Watts, Korin Richmond, Gustav Eje Henter · 2022

Computer science Psychology Mathematics

Automatically predicting the outcome of subjective listening tests is a\nchallenging task. Ratings may vary from person to person even if preferences\nare consistent across listeners. While previous work has focused on predicting\nlistener…

Modern speech synthesis for phonetic sciences: a discussion and an evaluation Open

Zofia Malisz, Gustav Eje Henter, Cassia Valentini-Botinhao, Oliver Watts, Jonas Beskow , et al. · 2020

Computer science Psychology History

Decades of gradual advances in speech synthesis have recently culminated in exponential improvements fuelled by deep learning. This quantum leap has the potential to finally deliver realistic, controllable, and robust synthetic stimuli for…

Listening-test materials for "Where do the improvements come from in sequence-to-sequence neural TTS?" Open

Oliver Watts, Gustav Eje Henter, Jason Fong, Cassia Valentini-Botinhao · 2020

Computer science Psychology Biology

This data release contains listening-test materials associated with the paper "Where do the improvements come from in sequence-to-sequence neural TTS?", presented at SSW10 (the 10th ISCA Speech Synthesis Workshop) in Vienna, Austria, 2019.

Where do the improvements come from in sequence-to-sequence neural TTS? Open

Oliver Watts, Gustav Eje Henter, Jason Fong, Cassia Valentini-Botinhao · 2019

Computer science Biology

Sequence-to-sequence neural networks with attention mechanisms have recently been widely adopted for text-to-speech. Compared with older, more modular statistical parametric synthesis systems, sequence-to-sequence systems feature three pro…

Listening-test materials for "Modern speech synthesis for phonetic sciences: a discussion and an evaluation" Open

Zofia Malisz, Gustav Eje Henter, Cassia Valentini-Botinhao, Oliver Watts, Jonas Beskow , et al. · 2019

Psychology Computer science Philosophy

This data release contains listening-test materials associated with the paper "Modern speech synthesis for phonetic sciences: a discussion and an evaluation", presented at ICPhS 2019 in Melbourne, Australia.

Exemplar-based Speech Waveform Generation Open

Oliver Watts, Cassia Valentini-Botinhao, Felipe Espic, Simon King · 2018

Computer science Physics Political science

This paper presents a simple but effective method for generating speech waveforms by selecting small units of stored speech to match a low-dimensional target representation. The method is designed as a drop-in replacement for the vocoder i…

Learning Interpretable Control Dimensions for Speech Synthesis by Using External Data Open

Zack Hodari, Oliver Watts, Srikanth Ronanki, Simon King · 2018

Computer science Engineering Mathematics

There are many aspects of speech that we might want to control when creating text-to-speech (TTS) systems. We present a general method that enables control of arbitrary aspects of speech, which we demonstrate on the task of emotion control…

A Hierarchical Encoder-Decoder Model for Statistical Parametric Speech Synthesis Open

Srikanth Ronanki, Oliver Watts, Simon King · 2017

Computer science Mathematics Biology

Current approaches to statistical parametric speech synthesis using Neural Networks generally require input at the same temporal resolution as the output, typically a frame every 5ms, or in some cases at waveform sampling rate. It is there…

Learning Word Vector Representations Based on Acoustic Counts Open

Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi · 2017

Computer science Mathematics Biology

This paper presents a simple count-based approach to learning word vector representations by leveraging statistics of cooccurrences between text and speech. This type of representation requires two discrete sequences of units defined acros…

Nativization of Foreign Names in TTS for Automatic Reading of World News in Swahili Open

Joseph R. Mendelson, Pilar Oplustil, Oliver Watts, Simon King · 2017

Computer science Biology Philosophy

When a text-to-speech (TTS) system is required to speak world news, a large fraction of the words to be spoken will be proper names originating in a wide variety of languages. Phonetization of these names based on target language letter-to…

The CSTR entry to the Blizzard Challenge 2016 Open

Thomas Merritt, Srikanth Ronanki, Zhizheng Wu, Oliver Watts · 2016

Computer science Engineering

Similar to 2016 and 2017 Blizzard Challenge, the task for this year is to train on expressively-read children’s story-books, and to synthesise speech in the same domain. This give us an opportunity to investigate the effectiveness of sever…

Syllable-Level Representations of Suprasegmental Features for DNN-Based Text-to-Speech Synthesis Open

Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi · 2016

Computer science

A top-down hierarchical system based on deep neural networks is investigated for the modeling of prosody in speech synthesis. Suprasegmental features are processed separately from segmental features and a compact distributed representation…

A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks Open

Takenori Yoshimura, Gustav Eje Henter, Oliver Watts, Mirjam Wester, Junichi Yamagishi , et al. · 2016

Computer science Physics

A problem when developing and tuning speech synthesis systems is that there is no well-established method of automatically rating the quality of the synthetic speech. This research attempts to obtain a new automated measure which is traine…

Robust TTS duration modelling using DNNS Open

Gustav Eje Henter, Srikanth Ronanki, Oliver Watts, Mirjam Wester, Zhizheng Wu , et al. · 2016

Computer science Mathematics Art

Accurate modelling and prediction of speech-sound durations is an important component in generating more natural synthetic speech. Deep neural networks (DNNs) offer a powerful modelling paradigm, and large, found corpora of natural and exp…

Wavelet-based decomposition of F0 as a secondary task for DNN-based speech synthesis with multi-task learning Open

Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi, Robert A. Clark · 2016

Computer science Economics

We investigate two wavelet-based decomposition strategies of the f0 signal and their usefulness as a secondary task for speech synthesis using multi-task deep neural networks (MTL-DNN). The first decomposition strategy uses a static set of…

Listening test materials for "From HMMs to DNNs: Where do the improvements come from?" Open

Oliver Watts, Gustav Eje Henter, Thomas Merritt, Zhizheng Wu, Simon King · 2016

Psychology Computer science Biology

This data release contains listening test materials associated with the paper "From HMMs to DNNs: Where do the improvements come from?", presented at ICASSP 2016 in Shanghai, China.

Listening test materials for "Robust TTS duration modelling using DNNs" Open

Gustav Eje Henter, Srikanth Ronanki, Oliver Watts, Mirjam Wester, Zhizheng Wu , et al. · 2016

Mathematics Computer science Psychology

This data release contains listening test materials associated with the paper "Robust TTS duration modelling using DNNs", presented at ICASSP 2016 in Shanghai, China.

Listening test materials for "Evaluating comprehension of natural and synthetic conversational speech" Open

Mirjam Wester, Oliver Watts, Gustav Eje Henter · 2016

Computer science Psychology Geography

Current speech synthesis methods typically operate on isolated sentences and lack convincing prosody when generating longer segments of speech. Similarly, prevailing TTS evaluation paradigms, such as intelligibility (transcription word err…

Oliver Watts YOU? Author Swipe