David Kant
YOU?
Author Swipe
View article: AutoMixer: Checkpoint Artifacts as Automatic Data Mixers
AutoMixer: Checkpoint Artifacts as Automatic Data Mixers Open
In language model training, it is desirable to equip models with capabilities from various tasks. However, it is not clear how to directly obtain the right data mixtures for these capabilities as the relationship between data and tasks is …
View article: MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation Open
We introduce MusicFlow, a cascaded text-to-music generation model based on flow matching. Based on self-supervised representations to bridge between text descriptions and music audios, we construct two flow matching networks to model the c…
View article: High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching
High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching Open
We introduce MelodyFlow, an efficient text-controllable high-fidelity music generation and editing model. It operates on continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec. Based on a diffu…
View article: In-Context Prompt Editing For Conditional Audio Generation
In-Context Prompt Editing For Conditional Audio Generation Open
Distributional shift is a central challenge in the deployment of machine learning models as they can be ill-equipped for real-world data. This is particularly evident in text-to-audio generation where the encoded representations are easily…
View article: Stack-and-Delay: a new codebook pattern for music generation
Stack-and-Delay: a new codebook pattern for music generation Open
In language modeling based music generation, a generated waveform is represented by a sequence of hierarchical token stacks that can be decoded either in an auto-regressive manner or in parallel, depending on the codebook patterns. In part…
View article: Simple and Controllable Music Generation
Simple and Controllable Music Generation Open
We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised …
View article: Self-Supervised Representations for Singing Voice Conversion
Self-Supervised Representations for Singing Voice Conversion Open
A singing voice conversion model converts a song in the voice of an arbitrary source singer to the voice of a target singer. Recently, methods that leverage self-supervised audio representations such as HuBERT and Wav2Vec 2.0 have helped f…
View article: Recording Meta-Soundscapes: Synchronized multi-frame audio field-recording at a large spatial scale
Recording Meta-Soundscapes: Synchronized multi-frame audio field-recording at a large spatial scale Open
As an investigation into alternatives to the predominant assumptions of conventional audio recording, the described research has sought to explore how field-recording of very large outdoor acoustic environments might be achieved. This was …
View article: Machine Listening as a Generative Model: Happy Valley Band
Machine Listening as a Generative Model: Happy Valley Band Open
ORGANVM PERCEPTVS is a collection of 11 songs for mixed ensemble written by translating machine listening analysis of pop songs into musical notation. Motivated by the idea that analysis algorithms inherently carry the values of the commun…