Explanipedia

librosa: Audio and Music Signal Analysis in Python Open

Brian McFee, Colin Raffel, Dawen Liang, Daniel P. W. Ellis, Matt McVicar , et al. · 2015

Computer science Art

This document describes version 0.4.0 of librosa: a Python package for audio and music signal processing. At a high level, librosa provides implementations of a variety of common functions used throughout the field of music information ret…

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing Open

Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu , et al. · 2022

Computer science Geography Physics

Self-supervised learning (SSL) achieves great success in speech recognition,\nwhile limited exploration has been attempted for other speech processing tasks.\nAs speech signal contains multi-faceted information including speaker identity,\…

Deep Learning for Audio Signal Processing Open

H.‐G. Purwins, Bo Li, Tuomas Virtanen, Jan Schlüter, Shuo-Yiin Chang , et al. · 2019

Computer science

Given the recent surge in developments of deep learning, this article\nprovides a review of the state-of-the-art deep learning techniques for audio\nsignal processing. Speech, music, and environmental sound processing are\nconsidered side-…

Detection and Classification of Acoustic Scenes and Events Open

Dan Stowell, Dimitrios Giannoulis, Emmanouil Benetos, Mathieu Lagrange, Mark D. Plumbley · 2015

Computer science Mathematics Sociology

For intelligent systems to make best use of the audio modality, it is important that they can recognize not just speech and music, which have been researched as specific tasks, but also general sounds in everyday environments. To stimulate…

pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis Open

Θεόδωρος Γιαννακόπουλος · 2015

Computer science

Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surv…

Singing Voice Separation With Deep U-Net Convolutional Networks. Open

Andreas Jansson, Eric J. Humphrey, Nicola Montecchio, Rachel Bittner, Aparna Kumar , et al. · 2017

Computer science Physics

[TODO] Add abstract here.

High Fidelity Neural Audio Compression Open

Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi · 2022

Computer science Engineering Physics

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-u…

The Playlist Experience: Personal Playlists in Music Streaming Services Open

Anja Nylund Hagen · 2015

Computer science

Music streaming services encompass features that enable the organization of music into playlists. This article inquires how users describe and make sense of practices and experiences of creating, curating, maintaining, and using personal p…

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Open

Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia , et al. · 2021

Computer science Philosophy Economics

We propose using self-supervised discrete representations for the task of\nspeech resynthesis. To generate disentangled representation, we separately\nextract low-bitrate representations for speech content, prosodic information,\nand speak…

Robust Reversible Audio Watermarking Scheme for Telemedicine and Privacy Protection Open

Xiaorui Zhang, Xun Sun, Xingming Sun, Wei Sun, Sunil Kumar Jha · 2021

Computer science Chemistry

The leakage of medical audio data in telemedicine seriously violates the privacy of patients. In order to avoid the leakage of patient information in telemedicine, a two-stage reversible robust audio watermarking algorithm is proposed to p…

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Open

Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou , et al. · 2023

Computer science Physics Biology

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called Vall-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS …

ViSQOL: an objective speech quality model Open

Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte · 2015

Computer science Engineering Geography

This paper presents an objective speech quality model, ViSQOL, the Virtual Speech Quality Objective Listener. It is a signal-based, full-reference, intrusive metric that models human speech quality perception using a spectro-temporal measu…

Sound Source Localization Using Deep Learning Models Open

Nelson Yalta, Kazuhiro Nakadai, Tetsuya Ogata · 2017

Computer science Chemistry Mathematics

[abstFig src='/00290001/04.jpg' width='300' text='Using a deep learning model, the robot locate the sound source from a multiple channel audio stream input' ] This study proposes the use of a deep neural network to localize a sound source …

ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric Open

Michael Chinen, Felicia S. C. Lim, Jan Skoglund, Nikita Gureev, Feargus O'Gorman , et al. · 2020

Computer science Engineering Economics

The 12th International Conference on Quality of Multimedia Experience (QoMEX), Athlone, Ireland (held online due to coronavirus outbreak), 26-28 May 2020

Low Bit-rate Speech Coding with VQ-VAE and a WaveNet Decoder Open

Cristina Gârbacea, Aäron van den Oord, Yazhe Li, Felicia S. C. Lim, Alejandro Luebs , et al. · 2019

Computer science Mathematics

In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demon…

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text Open

Nicholas Carlini, David Wagner · 2018

Computer science Mathematics

We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (recognizing up to 50 characters per sec…

MPEG-H 3D Audio—The New Standard for Coding of Immersive Spatial Audio Open

Jürgen Herre, Johannes Hilpert, Achim Kuntz, Jan Plogsties · 2015

Computer science Physics Mathematics

S.770-779

Music Recommender System Based on Genre using Convolutional Recurrent Neural Networks Open

Adiyansjah, Alexander Agung Santoso Gunawan, Derwin Suhartono · 2019

Computer science

With commercial music streaming service which can be accessed from mobile devices, the availability of digital music currently is abundant compared to previous era. Sorting out all this digital music is a very time-consuming and causes inf…

An Ensemble of Convolutional Neural Networks for Audio Classification Open

Loris Nanni, Gianluca Maguolo, Sheryl Brahnam, Michelangelo Paci · 2021

Computer science

Research in sound classification and recognition is rapidly advancing in the field of pattern recognition. One important area in this field is environmental sound recognition, whether it concerns the identification of endangered species in…

Audio-Visual Multimedia Quality Assessment: A Comprehensive Survey Open

Zahid Akhtar, Tiago H. Falk · 2017

Computer science Philosophy Economics

Measuring perceived quality of audio-visual signals at the end-user has become an important parameter in many multimedia networks and applications. It plays a crucial role in shaping audio-visual processing, compression, transmission and s…

A survey on deep reinforcement learning for audio-based applications Open

Siddique Latif, Heriberto Cuayáhuitl, Farrukh Pervez, Fahad Shamshad, Hafiz Shehbaz Ali , et al. · 2022

Computer science Mathematics

Deep reinforcement learning (DRL) is poised to revolutionise the field of artificial intelligence (AI) by endowing autonomous systems with high levels of understanding of the real world. Currently, deep learning (DL) is enabling DRL to eff…

Spread Spectrum-Based High Embedding Capacity Watermarking Method for Audio Signals Open

Yong Xiang, Iynkaran Natgunanathan, Yue Rong, Song Guo · 2015

Computer science Chemistry

Audio watermarking is a promising technology for copyright protection of audio data. Built upon the concept of spread spectrum (SS), many SS-based audio watermarking methods have been developed, where a pseudonoise (PN) sequence is usually…

A Combined Rule-Based & Machine Learning Audio-Visual Emotion Recognition Approach Open

Kah Phooi Seng, Li-Minn Ang, Chien Shing Ooi · 2016

Computer science

This paper proposes an audio-visual emotion recognition system that uses a mixture of rule-based and machine learning techniques to improve the recognition efficacy in the audio and video paths. The visual path is designed using the Bi-dir…

Self-Supervised Generation of Spatial Audio for 360 Video Open

Pedro Morgado, Nuno Vasconcelos, Timothy Langlois, Oliver Wang · 2018

Computer science

We introduce an approach to convert mono audio recorded by a 360 video camera into spatial audio, a representation of the distribution of sound over the full viewing sphere. Spatial audio is an important component of immersive 360 video vi…

Convolutional Neural Networks to Enhance Coded Speech Open

Ziyue Zhao, Huijun Liu, Tim Fingscheidt · 2018

Computer science

Enhancing coded speech suffering from far-end acoustic background noise, quantization noise, and potentially transmission errors is a challenging task. In this paper, we propose two postprocessing approaches applying convolutional neural n…

A new method for voice signal features creation Open

Majed O. Dwairi, Amjad Y. Hendi, Mohamed S. Soliman, Ziad Alqadi · 2019

Computer science

Digital audio is one of the most important types of data at present. It is used in several applications, such as human knowledge and many security and banking applications. A digital voice signal is usually of a large size where the acoust…

DDSP: Differentiable Digital Signal Processing Open

Jesse Engel, Lamtharn Hantrakul, Chenjie Gu, Adam P. Roberts · 2020

Computer science Art

Most generative models of audio directly generate samples in one of two domains: time or frequency. While sufficient to express any signal, these representations are inefficient, as they do not utilize existing knowledge of how sound is ge…

Towards Model Compression for Deep Learning Based Speech Enhancement Open

Ke Tan, DeLiang Wang · 2021

Computer science Medicine

The use of deep neural networks (DNNs) has dramatically elevated the performance of speech enhancement over the last decade. However, to achieve strong enhancement performance typically requires a large DNN, which is both memory and comput…

The Effect of Deep Learning Methods on Deepfake Audio Detection for Digital Investigation Open

Mvelo Mcuba, Avinash Singh, Richard A. Ikuesan, Hein S. Venter · 2023

Computer science Philosophy

Paper presented at CENTERIS – International Conference on ENTERprise Information Systems / ProjMAN – \nInternational Conference on Project MANagement / HCist – International Conference on Health \nand Social Care Information Systems and Te…

Teenagers, smartphones and digital audio consumption in the age of Spotify Open

Luis Miguel Pedrero Esteban, Andrés Barrios Rubio, Virginia Medina Ávila · 2019

Business Sociology Art

The consolidation of smartphones as dominant devices for access to digital information and entertainment has redefined the processes of production and commercialization of cultural communication industries. The nature of these screens, whi…

Speech coding