Deep Learning for Audio Signal Processing Article Swipe
YOU?
·
· 2019
· Open Access
·
· DOI: https://doi.org/10.1109/jstsp.2019.2908700
· OA: W2931364255
Given the recent surge in developments of deep learning, this article\nprovides a review of the state-of-the-art deep learning techniques for audio\nsignal processing. Speech, music, and environmental sound processing are\nconsidered side-by-side, in order to point out similarities and differences\nbetween the domains, highlighting general methods, problems, key references,\nand potential for cross-fertilization between areas. The dominant feature\nrepresentations (in particular, log-mel spectra and raw waveform) and deep\nlearning models are reviewed, including convolutional neural networks, variants\nof the long short-term memory architecture, as well as more audio-specific\nneural network models. Subsequently, prominent deep learning application areas\nare covered, i.e. audio recognition (automatic speech recognition, music\ninformation retrieval, environmental sound detection, localization and\ntracking) and synthesis and transformation (source separation, audio\nenhancement, generative models for speech, sound, and music synthesis).\nFinally, key issues and future questions regarding deep learning applied to\naudio signal processing are identified.\n