Mirjam Wester
YOU?
Author Swipe
Non linear time compression of clear and normal speech at high rates Open
We compare a series of time compression methods applied to normal and clear speech. First we evaluate a linear (uniform) method applied to these styles as well as to naturally-produced fast speech. We found, in line with the literature, th…
Bot or not: exploring the fine line between cyber and human identity Open
Speech technology is rapidly entering the everyday through the large scale commercial impact of systems such as Apple Siri and Amazon Echo. Meanwhile technology that allows voice cloning, voice modification, speech recognition, speech anal…
Speech Synthesis for the Generation of Artificial Personality Open
A synthetic voice personifies the system using it. In this work we examine the impact text content, voice quality and synthesis system have on the perceived personality of two synthetic voices. Subjects rated synthetic utterances based on …
A bi-directional task-based corpus of learners’ conversational speech Open
This paper describes a corpus of task-based conversational speech produced by English and Spanish native talkers speaking English and Spanish as both a first and a second language. For cross-language comparability, speech material was elic…
Multidimensional scaling of systems in the Voice Conversion Challenge 2016 Open
This study investigates how listeners judge the similarity of voice converted voices using a talker discrimination task. The data used is from the Voice Conversion Challenge 2016. 17 participants from around the world took part in building…
The Voice Conversion Challenge 2016 Open
This paper describes the Voice Conversion Challenge 2016 devised by the authors to better understand different voice conversion (VC) techniques by comparing their performance on a common dataset.The task of the challenge was speaker conver…
View article: A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks
A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks Open
A problem when developing and tuning speech synthesis systems is that there is no well-established method of automatically rating the quality of the synthetic speech. This research attempts to obtain a new automated measure which is traine…
View article: Robust TTS duration modelling using DNNS
Robust TTS duration modelling using DNNS Open
Accurate modelling and prediction of speech-sound durations is an important component in generating more natural synthetic speech. Deep neural networks (DNNs) offer a powerful modelling paradigm, and large, found corpora of natural and exp…
Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance Open
Due to copyright restrictions, the access to the full text of this article is only available via subscription.
View article: Listening test materials for "Robust TTS duration modelling using DNNs"
Listening test materials for "Robust TTS duration modelling using DNNs" Open
This data release contains listening test materials associated with the paper "Robust TTS duration modelling using DNNs", presented at ICASSP 2016 in Shanghai, China.
SUPERSEDED - The Voice Conversion Challenge 2016 Open
THIS VERSION HAS BEEN REPLACED DUE TO SOME OF THE FILES BEING CORRUPTED. PLEASE SEE THE NEW VERSION OF THIS DATASET AT https://doi.org/10.7488/ds/1575 . > The Voice Conversion Challenge (VCC) 2016, one of the special sessions at Interspeec…
The Voice Conversion Challenge, 2016: multidimensional scaling (MDS) listening test results Open
The Voice Conversion Challenge (VCC) 2016, one of the special sessions at Interspeech 2016, deals with speaker identity conversion, referred as Voice Conversion (VC). The task of the challenge was speaker conversion, i.e., to transform the…
View article: Listening test materials for "Evaluating comprehension of natural and synthetic conversational speech"
Listening test materials for "Evaluating comprehension of natural and synthetic conversational speech" Open
Current speech synthesis methods typically operate on isolated sentences and lack convincing prosody when generating longer segments of speech. Similarly, prevailing TTS evaluation paradigms, such as intelligibility (transcription word err…