28 May 2024 | Yixiao Zhang, Yukara Ikemiya, Gus Xia, Naoki Murata, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon
The paper introduces MusicMagus, a novel approach to zero-shot text-to-music editing using diffusion models. It addresses the challenge of editing generated music by modifying specific attributes like genre, mood, and instrument while maintaining other aspects unchanged. The method transforms text editing into latent space manipulation, incorporating an extra constraint to ensure consistency. MusicMagus integrates seamlessly with existing pre-trained text-to-music diffusion models without requiring additional training. Experimental results show superior performance over zero-shot and supervised baselines in style and timbre transfer evaluations. The approach is also demonstrated to be practical for real-world music editing scenarios. The main contributions include a flexible and user-friendly text-to-music editing method and the development of MusicMagus, a system capable of zero-shot music editing on diverse tasks without additional training.The paper introduces MusicMagus, a novel approach to zero-shot text-to-music editing using diffusion models. It addresses the challenge of editing generated music by modifying specific attributes like genre, mood, and instrument while maintaining other aspects unchanged. The method transforms text editing into latent space manipulation, incorporating an extra constraint to ensure consistency. MusicMagus integrates seamlessly with existing pre-trained text-to-music diffusion models without requiring additional training. Experimental results show superior performance over zero-shot and supervised baselines in style and timbre transfer evaluations. The approach is also demonstrated to be practical for real-world music editing scenarios. The main contributions include a flexible and user-friendly text-to-music editing method and the development of MusicMagus, a system capable of zero-shot music editing on diverse tasks without additional training.