BASS ACCOMPANIMENT GENERATION VIA LATENT DIFFUSION

BASS ACCOMPANIMENT GENERATION VIA LATENT DIFFUSION

2 Feb 2024 | Marco Pasini, Maarten Grachten, Stefan Lattner
The paper presents a novel system for generating musical accompaniment, specifically bass lines, that match an arbitrary input track. The core of the method involves an audio autoencoder that compresses audio waveform samples into invertible latent representations and a conditional latent diffusion model that generates the latent encoding of a corresponding stem from the input mix. To control the timbre of the generated samples, the authors introduce a technique to ground the latent space to a user-provided reference style during diffusion sampling. Additionally, they adapt classifier-free guidance to improve audio quality by avoiding distortions at high guidance strengths in the unbounded latent space. The model is trained on a dataset of pairs of mixes and matching bass stems, and quantitative experiments demonstrate its ability to generate baselines with user-specified timbres. The proposed framework represents a significant advancement in creating generative AI tools to assist musicians in music production.The paper presents a novel system for generating musical accompaniment, specifically bass lines, that match an arbitrary input track. The core of the method involves an audio autoencoder that compresses audio waveform samples into invertible latent representations and a conditional latent diffusion model that generates the latent encoding of a corresponding stem from the input mix. To control the timbre of the generated samples, the authors introduce a technique to ground the latent space to a user-provided reference style during diffusion sampling. Additionally, they adapt classifier-free guidance to improve audio quality by avoiding distortions at high guidance strengths in the unbounded latent space. The model is trained on a dataset of pairs of mixes and matching bass stems, and quantitative experiments demonstrate its ability to generate baselines with user-specified timbres. The proposed framework represents a significant advancement in creating generative AI tools to assist musicians in music production.
Reach us at info@study.space