Speech coding
View article
librosa: Audio and Music Signal Analysis in Python Open
This document describes version 0.4.0 of librosa: a Python package for audio and music signal processing. At a high level, librosa provides implementations of a variety of common functions used throughout the field of music information ret…
View article
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing Open
Self-supervised learning (SSL) achieves great success in speech recognition,\nwhile limited exploration has been attempted for other speech processing tasks.\nAs speech signal contains multi-faceted information including speaker identity,\…
View article
Deep Learning for Audio Signal Processing Open
Given the recent surge in developments of deep learning, this article\nprovides a review of the state-of-the-art deep learning techniques for audio\nsignal processing. Speech, music, and environmental sound processing are\nconsidered side-…
View article
Detection and Classification of Acoustic Scenes and Events Open
For intelligent systems to make best use of the audio modality, it is important that they can recognize not just speech and music, which have been researched as specific tasks, but also general sounds in everyday environments. To stimulate…
View article
pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis Open
Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surv…
View article
Singing Voice Separation With Deep U-Net Convolutional Networks. Open
[TODO] Add abstract here.
View article
High Fidelity Neural Audio Compression Open
We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-u…
View article
The Playlist Experience: Personal Playlists in Music Streaming Services Open
Music streaming services encompass features that enable the organization of music into playlists. This article inquires how users describe and make sense of practices and experiences of creating, curating, maintaining, and using personal p…
View article
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Open
We propose using self-supervised discrete representations for the task of\nspeech resynthesis. To generate disentangled representation, we separately\nextract low-bitrate representations for speech content, prosodic information,\nand speak…
View article
Robust Reversible Audio Watermarking Scheme for Telemedicine and Privacy Protection Open
The leakage of medical audio data in telemedicine seriously violates the privacy of patients. In order to avoid the leakage of patient information in telemedicine, a two-stage reversible robust audio watermarking algorithm is proposed to p…
View article
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Open
We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called Vall-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS …
View article
ViSQOL: an objective speech quality model Open
This paper presents an objective speech quality model, ViSQOL, the Virtual Speech Quality Objective Listener. It is a signal-based, full-reference, intrusive metric that models human speech quality perception using a spectro-temporal measu…
View article
Sound Source Localization Using Deep Learning Models Open
[abstFig src='/00290001/04.jpg' width='300' text='Using a deep learning model, the robot locate the sound source from a multiple channel audio stream input' ] This study proposes the use of a deep neural network to localize a sound source …
View article
ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric Open
The 12th International Conference on Quality of Multimedia Experience (QoMEX), Athlone, Ireland (held online due to coronavirus outbreak), 26-28 May 2020
View article
Low Bit-rate Speech Coding with VQ-VAE and a WaveNet Decoder Open
In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demon…
View article
Audio Adversarial Examples: Targeted Attacks on Speech-to-Text Open
We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (recognizing up to 50 characters per sec…
View article
MPEG-H 3D Audio—The New Standard for Coding of Immersive Spatial Audio Open
S.770-779
View article
Music Recommender System Based on Genre using Convolutional Recurrent Neural Networks Open
With commercial music streaming service which can be accessed from mobile devices, the availability of digital music currently is abundant compared to previous era. Sorting out all this digital music is a very time-consuming and causes inf…
View article
An Ensemble of Convolutional Neural Networks for Audio Classification Open
Research in sound classification and recognition is rapidly advancing in the field of pattern recognition. One important area in this field is environmental sound recognition, whether it concerns the identification of endangered species in…
View article
Audio-Visual Multimedia Quality Assessment: A Comprehensive Survey Open
Measuring perceived quality of audio-visual signals at the end-user has become an important parameter in many multimedia networks and applications. It plays a crucial role in shaping audio-visual processing, compression, transmission and s…
View article
A survey on deep reinforcement learning for audio-based applications Open
Deep reinforcement learning (DRL) is poised to revolutionise the field of artificial intelligence (AI) by endowing autonomous systems with high levels of understanding of the real world. Currently, deep learning (DL) is enabling DRL to eff…
View article
Spread Spectrum-Based High Embedding Capacity Watermarking Method for Audio Signals Open
Audio watermarking is a promising technology for copyright protection of audio data. Built upon the concept of spread spectrum (SS), many SS-based audio watermarking methods have been developed, where a pseudonoise (PN) sequence is usually…
View article
A Combined Rule-Based & Machine Learning Audio-Visual Emotion Recognition Approach Open
This paper proposes an audio-visual emotion recognition system that uses a mixture of rule-based and machine learning techniques to improve the recognition efficacy in the audio and video paths. The visual path is designed using the Bi-dir…
View article
Self-Supervised Generation of Spatial Audio for 360 Video Open
We introduce an approach to convert mono audio recorded by a 360 video camera into spatial audio, a representation of the distribution of sound over the full viewing sphere. Spatial audio is an important component of immersive 360 video vi…
View article
Convolutional Neural Networks to Enhance Coded Speech Open
Enhancing coded speech suffering from far-end acoustic background noise, quantization noise, and potentially transmission errors is a challenging task. In this paper, we propose two postprocessing approaches applying convolutional neural n…
View article
A new method for voice signal features creation Open
Digital audio is one of the most important types of data at present. It is used in several applications, such as human knowledge and many security and banking applications. A digital voice signal is usually of a large size where the acoust…
View article
DDSP: Differentiable Digital Signal Processing Open
Most generative models of audio directly generate samples in one of two domains: time or frequency. While sufficient to express any signal, these representations are inefficient, as they do not utilize existing knowledge of how sound is ge…
View article
Towards Model Compression for Deep Learning Based Speech Enhancement Open
The use of deep neural networks (DNNs) has dramatically elevated the performance of speech enhancement over the last decade. However, to achieve strong enhancement performance typically requires a large DNN, which is both memory and comput…
View article
The Effect of Deep Learning Methods on Deepfake Audio Detection for Digital Investigation Open
Paper presented at CENTERIS – International Conference on ENTERprise Information Systems / ProjMAN –
\nInternational Conference on Project MANagement / HCist – International Conference on Health
\nand Social Care Information Systems and Te…
View article
Teenagers, smartphones and digital audio consumption in the age of Spotify Open
The consolidation of smartphones as dominant devices for access to digital information and entertainment has redefined the processes of production and commercialization of cultural communication industries. The nature of these screens, whi…