Ethan Manilow
YOU?
Author Swipe
View article: Bioacoustics as a Measure of Population Size and Breeding Success of European Storm Petrels <i>Hydrobates pelagicus</i>
Bioacoustics as a Measure of Population Size and Breeding Success of European Storm Petrels <i>Hydrobates pelagicus</i> Open
Obtaining measures of population size and fitness are key first steps to understanding how and why species' populations change over time. Quantifying such metrics is difficult in some species, however, due to their remote location and/or e…
View article: SingSong: Generating musical accompaniments from singing
SingSong: Generating musical accompaniments from singing Open
We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice. To accomplish this, we build …
View article: Redefining Relationships in Music
Redefining Relationships in Music Open
AI tools increasingly shape how we discover, make and experience music. While these tools can have the potential to empower creativity, they may fundamentally redefine relationships between stakeholders, to the benefit of some and the detr…
View article: Multi-instrument Music Synthesis with Spectrogram Diffusion
Multi-instrument Music Synthesis with Spectrogram Diffusion Open
An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural synthesizers have exhibited a tradeoff between domain-speci…
View article: Scaling Polyphonic Transcription with Mixtures of Monophonic Transcriptions
Scaling Polyphonic Transcription with Mixtures of Monophonic Transcriptions Open
Automatic Music Transcription (AMT), in particular the problem of automatically extracting notes from audio, has seen much recent progress via the training of neural network models on musical audio recordings paired with aligned ground-tru…
View article: Music Separation Enhancement with Generative Modeling
Music Separation Enhancement with Generative Modeling Open
Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics. We propose a post-processing m…
View article: The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling
The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling Open
Data is the lifeblood of modern machine learning systems, including for those in Music Information Retrieval (MIR). However, MIR has long been mired by small datasets and unreliable labels. In this work, we propose to break this bottleneck…
View article: Music Separation Enhancement with Generative Modeling
Music Separation Enhancement with Generative Modeling Open
Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics. We propose a post-processing m…
View article: Multi-instrument Music Synthesis with Spectrogram Diffusion
Multi-instrument Music Synthesis with Spectrogram Diffusion Open
An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural synthesizers have exhibited a tradeoff between domain-speci…
View article: Improving Source Separation by Explicitly Modeling Dependencies Between Sources
Improving Source Separation by Explicitly Modeling Dependencies Between Sources Open
We propose a new method for training a supervised source separation system that aims to learn the interdependent relationships between all combinations of sources in a mixture. Rather than independently estimating each source from a mix, w…
View article: MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling
MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling Open
Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detailed expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatena…
View article: Leveraging Hierarchical Structures for Few-Shot Musical Instrument Recognition
Leveraging Hierarchical Structures for Few-Shot Musical Instrument Recognition Open
Deep learning work on musical instrument recognition has generally focused on instrument classes for which we have abundant data. In this work, we exploit hierarchical relationships between instruments in a few-shot learning setup to enabl…
View article: Sequence-to-Sequence Piano Transcription with Transformers
Sequence-to-Sequence Piano Transcription with Transformers Open
Automatic Music Transcription has seen significant progress in recent years by training custom deep neural networks on large datasets. However, these models have required extensive domain-specific design of network architectures, input/out…
View article: MT3: Multi-Task Multitrack Music Transcription
MT3: Multi-Task Multitrack Music Transcription Open
Automatic Music Transcription (AMT), inferring musical notes from raw audio, is a challenging task at the core of music understanding. Unlike Automatic Speech Recognition (ASR), which typically focuses on the words of a single speaker, AMT…
View article: Deep Learning Tools for Audacity: Helping Researchers Expand the Artist's Toolkit
Deep Learning Tools for Audacity: Helping Researchers Expand the Artist's Toolkit Open
We present a software framework that integrates neural networks into the popular open-source audio editing software, Audacity, with a minimal amount of developer effort. In this paper, we showcase some example use cases for both end-users …
View article: Unsupervised Source Separation By Steering Pretrained Music Models
Unsupervised Source Separation By Steering Pretrained Music Models Open
We showcase an unsupervised method that repurposes deep models trained for music generation and music tagging for audio source separation, without any retraining. An audio generation model is conditioned on an input mixture, producing a la…
View article: Sequence-to-Sequence Piano Transcription with Transformers
Sequence-to-Sequence Piano Transcription with Transformers Open
Automatic Music Transcription has seen significant progress in recent years by training custom deep neural networks on large datasets. However, these models have required extensive domain-specific design of network architectures, input/out…
View article: Hierarchical musical instrument separation
Hierarchical musical instrument separation Open
Many sounds that humans encounter are hierarchical in nature; a piano note is one of many played during a performance, which is one of many instruments in a band, which might be playing in a bar with other noises occurring. Inspired by thi…
View article: Bespoke Neural Networks for Score-Informed Source Separation
Bespoke Neural Networks for Score-Informed Source Separation Open
In this paper, we introduce a simple method that can separate arbitrary musical instruments from an audio mixture. Given an unaligned MIDI transcription for a target instrument from an input mixture, we synthesize new mixtures from the mid…
View article: Towards Musically Meaningful Explanations Using Source Separation
Towards Musically Meaningful Explanations Using Source Separation Open
Deep neural networks (DNNs) are successfully applied in a wide variety of music information retrieval (MIR) tasks. Such models are usually considered "black boxes", meaning that their predictions are not interpretable. Prior work on explai…
View article: audioLIME: Listenable Explanations Using Source Separation
audioLIME: Listenable Explanations Using Source Separation Open
Deep neural networks (DNNs) are successfully applied in a wide variety of music information retrieval (MIR) tasks but their predictions are usually not interpretable. We propose audioLIME, a method based on Local Interpretable Model-agnost…
View article: Simultaneous Separation and Transcription of Mixtures with Multiple Polyphonic and Percussive Instruments
Simultaneous Separation and Transcription of Mixtures with Multiple Polyphonic and Percussive Instruments Open
We present a single deep learning architecture that can both separate an audio recording of a musical mixture into constituent single-instrument recordings and transcribe these instruments into a human-readable format at the same time, lea…
View article: Simultaneous Separation and Transcription of Mixtures with Multiple\n Polyphonic and Percussive Instruments
Simultaneous Separation and Transcription of Mixtures with Multiple\n Polyphonic and Percussive Instruments Open
We present a single deep learning architecture that can both separate an\naudio recording of a musical mixture into constituent single-instrument\nrecordings and transcribe these instruments into a human-readable format at the\nsame time, …
View article: Slakh2100
Slakh2100 Open
Introduction: The Synthesized Lakh (Slakh) Dataset is a dataset of multi-track audio and aligned MIDI for music source separation and multi-instrument automatic transcription. Individual MIDI tracks are synthesized from the Lakh MIDI Datas…
View article: BabySlakh
BabySlakh Open
Introduction BabySlakh is a tiny version of Slakh2100 (zenodo link) that is useful for debugging. It consists of the first 20 tracks of Slakh2100 (i.e., Track00001 through Track00020). All of the audio is in the wav format and has a sample…
View article: BabySlakh
BabySlakh Open
Introduction: BabySlakh is a tiny version of Slakh2100 (zenodo link) that is useful for debugging and prototyping. It consists of the first 20 tracks of Slakh2100 (i.e., Track00001 through Track00020, inclusive). All of the audio is in the…
View article: Slakh2100
Slakh2100 Open
Introduction: The Synthesized Lakh (Slakh) Dataset is a dataset of multi-track audio and aligned MIDI for music source separation and multi-instrument automatic transcription. Individual MIDI tracks are synthesized from the Lakh MIDI Datas…
View article: BabySlakh
BabySlakh Open
Introduction: BabySlakh is a tiny version of Slakh2100 (zenodo link) that is useful for debugging and prototyping. It consists of the first 20 tracks of Slakh2100 (i.e., Track00001 through Track00020, inclusive). All of the audio is in the…
View article: Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity
Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity Open
Music source separation performance has greatly improved in recent years with the advent of approaches based on deep learning. Such methods typically require large amounts of labelled training data, which in the case of music consist of mi…
View article: WHAM!: Extending Speech Separation to Noisy Environments
WHAM!: Extending Speech Separation to Noisy Environments Open
Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem. However, most studies in this area use a constrained problem setu…