Philippe Esling
YOU?
Author Swipe
View article: Keep what you need : extracting efficient subnetworks from large audio representation models
Keep what you need : extracting efficient subnetworks from large audio representation models Open
Recently, research on audio foundation models has witnessed notable advances, as illustrated by the ever improving results on complex downstream tasks. Subsequently, those pretrained networks have quickly been used for various audio applic…
View article: Unsupervised Composable Representations for Audio
Unsupervised Composable Representations for Audio Open
Current generative models are able to generate high-quality artefacts but have been shown to struggle with compositional reasoning, which can be defined as the ability to generate complex structures from simpler elements. In this paper, we…
View article: Combining audio control and style transfer using latent diffusion
Combining audio control and style transfer using latent diffusion Open
Deep generative models are now able to synthesize high-quality audio signals, shifting the critical aspect in their development from audio quality to control capabilities. Although text-to-music generation is getting largely adopted by the…
View article: Embodied exploration of deep latent spaces in interactive dance-music performance
Embodied exploration of deep latent spaces in interactive dance-music performance Open
In recent years, significant advances have been made in deep learning models for audio generation, offering promising tools for musical creation. In this work, we investigate the use of deep audio generative models in interactive dance/mus…
View article: Is Quality Enoughƒ Integrating Energy Consumption in a Large-Scale Evaluation of Neural Audio Synthesis Models
Is Quality Enoughƒ Integrating Energy Consumption in a Large-Scale Evaluation of Neural Audio Synthesis Models Open
Deep learning models are now core components of modern audio synthesis, and their use has increased significantly in recent years, leading to highly accurate systems for multiple tasks. However, this quest for quality comes at a tremendous…
View article: Continuous descriptor-based control for deep audio synthesis
Continuous descriptor-based control for deep audio synthesis Open
Despite significant advances in deep models for music generation, the use of these techniques remains restricted to expert users. Before being democratized among musicians, generative models must first provide expressive control over the g…
View article: SingSong: Generating musical accompaniments from singing
SingSong: Generating musical accompaniments from singing Open
We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice. To accomplish this, we build …
View article: Creative divergent synthesis with generative models
Creative divergent synthesis with generative models Open
Machine learning approaches now achieve impressive generation capabilities in numerous domains such as image, audio or video. However, most training \& evaluation frameworks revolve around the idea of strictly modelling the original data d…
View article: Challenges in creative generative models for music: a divergence maximization perspective
Challenges in creative generative models for music: a divergence maximization perspective Open
The development of generative Machine Learning (ML) models in creative practices, enabled by the recent improvements in usability and availability of pre-trained models, is raising more and more interest among artists, practitioners and pe…
View article: Challenges in creative generative models for music: a divergence maximization perspective
Challenges in creative generative models for music: a divergence maximization perspective Open
The development of generative Machine Learning (ML) models in creative practices, enabled by the recent improvements in usability and availability of pre-trained models, is raising more and more interest among artists, practitioners and pe…
View article: Streamable Neural Audio Synthesis With Non-Causal Convolutions
Streamable Neural Audio Synthesis With Non-Causal Convolutions Open
Deep learning models are mostly used in an offline inference fashion. However, this strongly limits the use of these models inside audio generation setups, as most creative workflows are based on real-time digital signal processing. Althou…
View article: HEAR: Holistic Evaluation of Audio Representations
HEAR: Holistic Evaluation of Audio Representations Open
What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a…
View article: RAVE: A variational autoencoder for fast and high-quality neural audio synthesis
RAVE: A variational autoencoder for fast and high-quality neural audio synthesis Open
Deep generative models applied to audio have improved by a large margin the state-of-the-art in many speech and music related tasks. However, as raw waveform modelling remains an inherently difficult task, audio generative models are eithe…
View article: Combining Real-Time Extraction and Prediction of Musical Chord Progressions for Creative Applications
Combining Real-Time Extraction and Prediction of Musical Chord Progressions for Creative Applications Open
Recently, the field of musical co-creativity has gained some momentum. In this context, our goal is twofold: to develop an intelligent listening and predictive module of chord sequences, and to propose an adapted evaluation of the associat…
View article: Energy Consumption of Deep Generative Audio Models.
Energy Consumption of Deep Generative Audio Models. Open
In most scientific domains, the deep learning community has largely focused on the quality of deep generative models, resulting in highly accurate and successful solutions. However, this race for quality comes at a tremendous computational…
View article: Signal-domain representation of symbolic music for learning embedding spaces
Signal-domain representation of symbolic music for learning embedding spaces Open
A key aspect of machine learning models lies in their ability to learn efficient intermediate features. However, the input representation plays a crucial role in this process, and polyphonic musical scores remain a particularly complex typ…
View article: Energy Consumption of Deep Generative Audio Models
Energy Consumption of Deep Generative Audio Models Open
In most scientific domains, the deep learning community has largely focused on the quality of deep generative models, resulting in highly accurate and successful solutions. However, this race for quality comes at a tremendous computational…
View article: Proceedings of the 2020 Joint Conference on AI Music Creativity
Proceedings of the 2020 Joint Conference on AI Music Creativity Open
Modern approaches to sound synthesis using deep neural networks are hard to\ncontrol, especially when fine-grained conditioning information is not\navailable, hindering their adoption by musicians.\n In this paper, we cast the generation o…
View article: Attributes-aware deep music transformation
Attributes-aware deep music transformation Open
Recent machine learning techniques have enabled a large variety of novel music generation processes. However, most approaches do not provide any form of interpretable control over musical attributes, such as pitch and rhythm. Obtaining con…
View article: Creativity in the era of artificial intelligence
Creativity in the era of artificial intelligence Open
Creativity is a deeply debated topic, as this concept is arguably quintessential to our humanity. Across different epochs, it has been infused with an extensive variety of meanings relevant to that era. Along these, the evolution of techno…
View article: Timbre latent space: exploration and creative aspects
Timbre latent space: exploration and creative aspects Open
Recent studies show the ability of unsupervised models to learn invertible audio representations using Auto-Encoders. They enable high-quality sound synthesis but a limited control since the latent spaces do not disentangle timbre properti…
View article: Neural Granular Sound Synthesis
Neural Granular Sound Synthesis Open
Granular sound synthesis is a popular audio generation technique based on rearranging sequences of small waveform windows. In order to control the synthesis, all grains in a given corpus are analyzed through a set of acoustic descriptors. …
View article: Diet deep generative audio models with structured lottery
Diet deep generative audio models with structured lottery Open
Deep learning models have provided extremely successful solutions in most audio application fields. However, the high accuracy of these models comes at the expense of a tremendous computation cost. This aspect is almost always overlooked i…
View article: Ultra-light deep MIR by trimming lottery tickets
Ultra-light deep MIR by trimming lottery tickets Open
Current state-of-the-art results in Music Information Retrieval are largely dominated by deep learning approaches. These provide unprecedented accuracy across all tasks. However, the consistently overlooked downside of these models is thei…
View article: Vector-Quantized Timbre Representation
Vector-Quantized Timbre Representation Open
Timbre is a set of perceptual attributes that identifies different types of sound sources. Although its definition is usually elusive, it can be seen from a signal processing viewpoint as all the spectral features that are perceived indepe…
View article: FlowSynth: Simplifying Complex Audio Generation Through Explorable Latent Spaces with Normalizing Flows
FlowSynth: Simplifying Complex Audio Generation Through Explorable Latent Spaces with Normalizing Flows Open
Audio synthesizers are pervasive in modern music production. These highly complex audio generation functions provide a unique diversity through their large sets of parameters. However, this feature also can make them extremely hard and obf…
View article: Cross-modal variational inference for bijective signal-symbol translation
Cross-modal variational inference for bijective signal-symbol translation Open
Extraction of symbolic information from signals is an active field of research enabling numerous applications especially in the Musical Information Retrieval domain. This complex task, that is also related to other topics such as pitch ext…
View article: Flow Synthesizer: Universal Audio Synthesizer Control with Normalizing Flows
Flow Synthesizer: Universal Audio Synthesizer Control with Normalizing Flows Open
The ubiquity of sound synthesizers has reshaped modern music production, and novel music genres are now sometimes even entirely defined by their use. However, the increasing complexity and number of parameters in modern synthesizers make t…
View article: Using musical relationships between chord labels in automatic chord extraction tasks
Using musical relationships between chord labels in automatic chord extraction tasks Open
Recent researches on Automatic Chord Extraction (ACE) have focused on the improvement of models based on machine learning. However, most models still fail to take into account the prior knowledge underlying the labeling alphabets (chord la…
View article: Multi-Step Chord Sequence Prediction Based on Aggregated Multi-Scale Encoder-Decoder Network
Multi-Step Chord Sequence Prediction Based on Aggregated Multi-Scale Encoder-Decoder Network Open
This paper studies the prediction of chord progressions for jazz music by relying on machine learning models. The motivation of our study comes from the recent success of neural networks for performing automatic music composition. Although…