Oliver Watts
YOU?
Author Swipe
View article: Digital App for Speech and Health Monitoring Study (DASH): protocol for a prospective longitudinal case–control observational study for developing speech datasets in neurodegenerative disorders and dementia
Digital App for Speech and Health Monitoring Study (DASH): protocol for a prospective longitudinal case–control observational study for developing speech datasets in neurodegenerative disorders and dementia Open
Introduction Neurodegenerative disorders (NDDs) represent an unprecedented public health burden. These disorders are clinically heterogeneous and therapeutically challenging, but advances in discovery science and trial methodology offer ho…
View article: Comparator Loss: An Ordinal Contrastive Loss to Derive a Severity Score for Speech-based Health Monitoring
Comparator Loss: An Ordinal Contrastive Loss to Derive a Severity Score for Speech-based Health Monitoring Open
Monitoring the progression of neurodegenerative disease has important applications in the planning of treatment and the evaluation of future medications. Whereas much of the state-of-the-art in health monitoring from speech has been focuse…
View article: Voice Conversion-based Privacy through Adversarial Information Hiding
Voice Conversion-based Privacy through Adversarial Information Hiding Open
Privacy-preserving voice conversion aims to remove only the attributes of speech audio that convey identity information, keeping other speech characteristics intact. This paper presents a mechanism for privacy-preserving voice conversion t…
View article: Performance of data-driven inner speech decoding with same-task EEG-fMRI data fusion and bimodal models
Performance of data-driven inner speech decoding with same-task EEG-fMRI data fusion and bimodal models Open
Decoding inner speech from the brain signal via hybridisation of fMRI and EEG data is explored to investigate the performance benefits over unimodal models. Two different bimodal fusion approaches are examined: concatenation of probability…
View article: PUFFIN: Pitch-Synchronous Neural Waveform Generation for Fullband Speech on Modest Devices
PUFFIN: Pitch-Synchronous Neural Waveform Generation for Fullband Speech on Modest Devices Open
We present a neural vocoder designed with low-powered Alternative and Augmentative Communication devices in mind. By combining elements of successful modern vocoders with established ideas from an older generation of technology, our system…
View article: Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices
Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices Open
We present a neural vocoder designed with low-powered Alternative and Augmentative Communication devices in mind. By combining elements of successful modern vocoders with established ideas from an older generation of technology, our system…
View article: Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks
Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks Open
Automatically predicting the outcome of subjective listening tests is a\nchallenging task. Ratings may vary from person to person even if preferences\nare consistent across listeners. While previous work has focused on predicting\nlistener…
View article: Modern speech synthesis for phonetic sciences: a discussion and an evaluation
Modern speech synthesis for phonetic sciences: a discussion and an evaluation Open
Decades of gradual advances in speech synthesis have recently culminated in exponential improvements fuelled by deep learning. This quantum leap has the potential to finally deliver realistic, controllable, and robust synthetic stimuli for…
View article: Listening-test materials for "Where do the improvements come from in sequence-to-sequence neural TTS?"
Listening-test materials for "Where do the improvements come from in sequence-to-sequence neural TTS?" Open
This data release contains listening-test materials associated with the paper "Where do the improvements come from in sequence-to-sequence neural TTS?", presented at SSW10 (the 10th ISCA Speech Synthesis Workshop) in Vienna, Austria, 2019.
View article: Where do the improvements come from in sequence-to-sequence neural TTS?
Where do the improvements come from in sequence-to-sequence neural TTS? Open
Sequence-to-sequence neural networks with attention mechanisms have recently been widely adopted for text-to-speech. Compared with older, more modular statistical parametric synthesis systems, sequence-to-sequence systems feature three pro…
View article: Listening-test materials for "Modern speech synthesis for phonetic sciences: a discussion and an evaluation"
Listening-test materials for "Modern speech synthesis for phonetic sciences: a discussion and an evaluation" Open
This data release contains listening-test materials associated with the paper "Modern speech synthesis for phonetic sciences: a discussion and an evaluation", presented at ICPhS 2019 in Melbourne, Australia.
View article: Exemplar-based Speech Waveform Generation
Exemplar-based Speech Waveform Generation Open
This paper presents a simple but effective method for generating speech waveforms by selecting small units of stored speech to match a low-dimensional target representation. The method is designed as a drop-in replacement for the vocoder i…
View article: Learning Interpretable Control Dimensions for Speech Synthesis by Using External Data
Learning Interpretable Control Dimensions for Speech Synthesis by Using External Data Open
There are many aspects of speech that we might want to control when creating text-to-speech (TTS) systems. We present a general method that enables control of arbitrary aspects of speech, which we demonstrate on the task of emotion control…
View article: A Hierarchical Encoder-Decoder Model for Statistical Parametric Speech Synthesis
A Hierarchical Encoder-Decoder Model for Statistical Parametric Speech Synthesis Open
Current approaches to statistical parametric speech synthesis using Neural Networks generally require input at the same temporal resolution as the output, typically a frame every 5ms, or in some cases at waveform sampling rate. It is there…
View article: Learning Word Vector Representations Based on Acoustic Counts
Learning Word Vector Representations Based on Acoustic Counts Open
This paper presents a simple count-based approach to learning word vector representations by leveraging statistics of cooccurrences between text and speech. This type of representation requires two discrete sequences of units defined acros…
View article: Nativization of Foreign Names in TTS for Automatic Reading of World News in Swahili
Nativization of Foreign Names in TTS for Automatic Reading of World News in Swahili Open
When a text-to-speech (TTS) system is required to speak world news, a large fraction of the words to be spoken will be proper names originating in a wide variety of languages. Phonetization of these names based on target language letter-to…
View article: The CSTR entry to the Blizzard Challenge 2016
The CSTR entry to the Blizzard Challenge 2016 Open
Similar to 2016 and 2017 Blizzard Challenge, the task for this year is to train on expressively-read children’s story-books, and to synthesise speech in the same domain. This give us an opportunity to investigate the effectiveness of sever…
View article: Syllable-Level Representations of Suprasegmental Features for DNN-Based Text-to-Speech Synthesis
Syllable-Level Representations of Suprasegmental Features for DNN-Based Text-to-Speech Synthesis Open
A top-down hierarchical system based on deep neural networks is investigated for the modeling of prosody in speech synthesis. Suprasegmental features are processed separately from segmental features and a compact distributed representation…
View article: A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks
A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks Open
A problem when developing and tuning speech synthesis systems is that there is no well-established method of automatically rating the quality of the synthetic speech. This research attempts to obtain a new automated measure which is traine…
View article: Robust TTS duration modelling using DNNS
Robust TTS duration modelling using DNNS Open
Accurate modelling and prediction of speech-sound durations is an important component in generating more natural synthetic speech. Deep neural networks (DNNs) offer a powerful modelling paradigm, and large, found corpora of natural and exp…
View article: Wavelet-based decomposition of F0 as a secondary task for DNN-based speech synthesis with multi-task learning
Wavelet-based decomposition of F0 as a secondary task for DNN-based speech synthesis with multi-task learning Open
We investigate two wavelet-based decomposition strategies of the f0 signal and their usefulness as a secondary task for speech synthesis using multi-task deep neural networks (MTL-DNN). The first decomposition strategy uses a static set of…
View article: Listening test materials for "From HMMs to DNNs: Where do the improvements come from?"
Listening test materials for "From HMMs to DNNs: Where do the improvements come from?" Open
This data release contains listening test materials associated with the paper "From HMMs to DNNs: Where do the improvements come from?", presented at ICASSP 2016 in Shanghai, China.
View article: Listening test materials for "Robust TTS duration modelling using DNNs"
Listening test materials for "Robust TTS duration modelling using DNNs" Open
This data release contains listening test materials associated with the paper "Robust TTS duration modelling using DNNs", presented at ICASSP 2016 in Shanghai, China.
View article: Listening test materials for "Evaluating comprehension of natural and synthetic conversational speech"
Listening test materials for "Evaluating comprehension of natural and synthetic conversational speech" Open
Current speech synthesis methods typically operate on isolated sentences and lack convincing prosody when generating longer segments of speech. Similarly, prevailing TTS evaluation paradigms, such as intelligibility (transcription word err…