Explanipedia

SynBAD: Synthetic Binaural Audio Dataset Open

Davoud Shariat Panah, Dan Barry, Alessandro Ragano, Jan Skoglund, Andrew Hines · 2025

The SynBAD dataset contains synthetic binaural renders of various audio contents. The dataset samples were generated by applying specific Head Related Transfer Functions (HRTF) from subject D2 of the SADIE II database to various audio cont…

Binamix -- A Python Library for Generating Binaural Audio Datasets Open

Dan Barry, Davoud Shariat Panah, Alessandro Ragano, Jan Skoglund, Andrew Hines · 2025

The increasing demand for spatial audio in applications such as virtual reality, immersive media, and spatial audio research necessitates robust solutions to generate binaural audio data sets for use in testing and validation. Binamix is a…

Perceptual Audio Coding: A 40-Year Historical Perspective Open

Jürgen Herre, Schuyler Quackenbush, Minje Kim, Jan Skoglund · 2025

In the history of audio and acoustic signal processing, perceptual audio coding has certainly excelled as a bright success story by its ubiquitous deployment in virtually all digital media devices, such as computers, tablets, mobile phones…

SCOREQ: Speech Quality Assessment with Contrastive Regression Open

Alessandro Ragano, Jan Skoglund, Andrew Hines · 2024

Psychology Computer science Mathematics

In this paper, we present SCOREQ, a novel approach for speech quality prediction. SCOREQ is a triplet loss function for contrastive regression that addresses the domain generalisation shortcoming exhibited by state of the art no-reference …

Neural Speech and Audio Coding: Modern AI Technology Meets Traditional Codecs Open

Minje Kim, Jan Skoglund · 2024

Computer science Mathematics

This paper explores the integration of model-based and data-driven approaches within the realm of neural speech and audio coding systems. It highlights the challenges posed by the subjective evaluation processes of speech and audio codecs …

NOMAD: Unsupervised Learning of Perceptual Embeddings for Speech Enhancement and Non-matching Reference Audio Quality Assessment Open

Alessandro Ragano, Jan Skoglund, Andrew Hines · 2023

Computer science Mathematics Economics

This paper presents NOMAD (Non-Matching Audio Distance), a differentiable perceptual similarity metric that measures the distance of a degraded signal against non-matching references. The proposed method is based on learning deep feature e…

A Comparison Of Deep Learning MOS Predictors For Speech Synthesis Quality Open

Alessandro Ragano, Emmanouil Benetos, Michael Chinen, Helard Becerra Martinez, Chandan K. Reddy , et al. · 2023

Computer science Mathematics Engineering

Speech synthesis quality prediction has made remarkable progress with the development of supervised and self-supervised learning (SSL) MOS predictors but some aspects related to the data are still unclear and require further study. In this…

Context-Based Evaluation of the Opus Audio Codec for Spatial Audio Content in Virtual Reality Open

Ben Lee, Tomasz Rudzki, Jan Skoglund, Gavin Kearney · 2023

Computer science Biology

This paper discusses the evaluation of Opus-compressed Ambisonic audio content through listening tests conducted in a virtual reality environment.The aim of this study was to investigate the effect that Opus compression has on the Basic Au…

LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models Open

Teerapat Jenrungrot, Michael Chinen, W. Bastiaan Kleijn, Jan Skoglund, Zalán Borsos , et al. · 2023

Computer science Physics

We introduce LMCodec, a causal neural speech codec that provides high quality audio at very low bitrates. The backbone of the system is a causal convolutional codec that encodes audio into a hierarchy of coarse-to-fine tokens using residua…

Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset Open

Michael Chinen, Jan Skoglund, Chandan K. Reddy, Alessandro Ragano, Andrew Hines · 2022

Computer science Mathematics Chemistry

Non-reference speech quality models are important for a growing number of applications. The VoiceMOS 2022 challenge provided a dataset of synthetic voice conversion and text-to-speech samples with subjective labels. This study looks at the…

Speech quality assessment with WARP‐Q: From similarity to subsequence dynamic time warp cost Open

Wissam A. Jassim, Jan Skoglund, Michael Chinen, Andrew Hines · 2022

Computer science Mathematics

Speech coding has been shown to achieve good speech quality using either waveform matching or parametric reconstruction. For very low bit rate streams, recently developed generative speech models can reconstruct high‐quality wideband speec…

Ultra-Low-Bitrate Speech Coding with Pretrained Transformers Open

Ali Siahkoohi, Michael Chinen, Tom Denton, W. Bastiaan Kleijn, Jan Skoglund · 2022

Computer science Physics

Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While …

A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality Open

Alessandro Ragano, Emmanouil Benetos, Michael Chinen, Helard Becerra Martinez, Chandan K. Reddy , et al. · 2022

Computer science Mathematics Philosophy

Speech synthesis quality prediction has made remarkable progress with the development of supervised and self-supervised learning (SSL) MOS predictors but some aspects related to the data are still unclear and require further study. In this…

SoundStream: An End-to-End Neural Audio Codec Open

Neil Zeghidour, Alejandro Luebs, Ahmed S. Omran, Jan Skoglund, Marco Tagliasacchi · 2021

Computer science

We present SoundStream, a novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs. SoundStream relies on a model architecture composed by a fully convol…

Speech quality estimation with deep lattice networks Open

Michael Chinen, Jan Skoglund, Andrew Hines · 2021

Computer science Mathematics Economics

Intrusive subjective speech quality estimation of mean opinion score (MOS) often involves mapping a raw similarity score extracted from differences between the clean and degraded utterance onto MOS with a fitted mapping function. More rece…

Warp-Q: Quality Prediction for Generative Neural Speech Codecs Open

Wissam A. Jassim, Jan Skoglund, Michael Chinen, Andrew Hines · 2021

Computer science Mathematics Philosophy

Good speech quality has been achieved using waveform matching and parametric reconstruction coders. Recently developed very low bit rate generative codecs can reconstruct high quality wideband speech with bit streams less than 3 kb/s. Thes…

Generative Speech Coding with Predictive Variance Regularization Open

W. Bastiaan Kleijn, Andrew Storus, Michael Chinen, Tom Denton, Felicia S. C. Lim , et al. · 2021

Computer science Mathematics Engineering

The recent emergence of machine-learning based generative models for speech suggests a significant reduction in bit rate for speech codecs is possible. However, the performance of generative models deteriorates significantly with the disto…

VISQOL: The Virtual Speech Quality Objective Listener Open

Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte · 2021

Computer science Engineering Philosophy

A model of human speech quality perception has been developed to provide an objective measure for predicting subjective quality assessments. The Virtual Speech Quality Objective Listener (ViSQOL) model is a signal based full reference metr…

Handling Background Noise in Neural Speech Generation Open

Tom Denton, Alejandro Luebs, Felicia S. C. Lim, Andrew Storus, Hengchin Yeh , et al. · 2020

Computer science

Recent advances in neural-network based generative modeling of speech has shown great potential for speech coding. However, the performance of such models drops when the input is not clean speech, e.g., in the presence of background noise,…

Improving Opus Low Bit Rate Quality with Neural Speech Synthesis Open

Jan Skoglund, Jean-Marc Valin · 2020

Computer science Mathematics

The voice mode of the Opus audio coder can compress wideband speech at bit rates ranging from 6 kb/s to 40 kb/s. However, Opus is at its core a waveform matching coder, and as the rate drops below 10 kb/s, quality degrades quickly. As the …

AMBIQUAL: Towards a Quality Metric for Headphone Rendered Compressed Ambisonic Spatial Audio Open

Miroslaw Narbutt, Jan Skoglund, Andrew S. Allen, Michael Chinen, Dan Barry , et al. · 2020

Computer science Engineering Sociology

Spatial audio is essential for creating a sense of immersion in virtual environments. Efficient encoding methods are required to deliver spatial audio over networks without compromising Quality of Service (QoS). Streaming service providers…

ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric Open

Michael Chinen, Felicia S. C. Lim, Jan Skoglund, Nikita Gureev, Feargus O'Gorman , et al. · 2020

Computer science Engineering Economics

The 12th International Conference on Quality of Multimedia Experience (QoMEX), Athlone, Ireland (held online due to coronavirus outbreak), 26-28 May 2020

Speech Quality Factors for Traditional and Neural-Based Low Bit Rate Vocoders Open

Wissam A. Jassim, Jan Skoglund, Michael Chinen, Andrew Hines · 2020

Computer science Mathematics Philosophy

This study compares the performances of different algorithms for coding speech at low bit rates. In addition to widely deployed traditional vocoders, a selection of recently developed generative-model-based coders at different bit rates ar…

ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric Open

Michael Chinen, Felicia S. C. Lim, Jan Skoglund, Nikita Gureev, Feargus O'Gorman , et al. · 2020

Computer science Engineering Philosophy

Estimation of perceptual quality in audio and speech is possible using a variety of methods. The combined v3 release of ViSQOL and ViSQOLAudio (for speech and audio, respectively,) provides improvements upon previous versions, in terms of …

Speech Quality Factors for Traditional and Neural-Based Low Bit Rate\n Vocoders Open

Wissam A. Jassim, Jan Skoglund, Michael Chinen, Andrew Hines · 2020

Computer science Mathematics Economics

This study compares the performances of different algorithms for coding\nspeech at low bit rates. In addition to widely deployed traditional vocoders, a\nselection of recently developed generative-model-based coders at different bit\nrates…

Salient Speech Representations Based on Cloned Networks Open

W. Bastiaan Kleijn, Felicia S. C. Lim, Michael Chinen, Jan Skoglund · 2019

Computer science Biology Philosophy

We define salient features as features that are shared by signals that are defined as being equivalent by a system designer. The definition allows the designer to contribute qualitative information. We aim to find salient features that are…

A Real-Time Wideband Neural Vocoder at 1.6kb/s Using LPCNet Open

Jean-Marc Valin, Jan Skoglund · 2019

Computer science Mathematics Engineering

Neural speech synthesis algorithms are a promising new approach for coding speech at very low bitrate. They have so far demonstrated quality that far exceeds traditional vocoders, at the cost of very high complexity. In this work, we prese…

Generative Speech Enhancement Based on Cloned Networks Open

Michael Chinen, W. Bastiaan Kleijn, Felicia S. C. Lim, Jan Skoglund · 2019

Computer science Philosophy

We propose to implement speech enhancement by the regeneration of clean speech from a salient representation extracted from the noisy signal. The network that extracts salient features is trained using a set of weight-sharing clones of the…

Auditory Localization in Low-Bitrate Compressed Ambisonic Scenes Open

Tomasz Rudzki, Ignacio Gomez-Lanzaco, Jessica Stubbs, Jan Skoglund, Damian Murphy , et al. · 2019

Computer science Mathematics Physics

The increasing popularity of Ambisonics as a spatial audio format for streaming services poses new challenges to existing audio coding techniques. Immersive audio delivered to mobile devices requires an efficient bitrate compression that d…

Jan Skoglund YOU? Author Swipe