[slides and audio] Music Style Transfer with Time-Varying Inversion of Diffusion Models

This paper presents a novel approach for music style transfer using time-varying textual inversion and diffusion models. The method aims to capture musical attributes with minimal data, enabling the transfer of styles from specific instruments, natural sounds, and synthesized sound effects to arbitrary content music. The key contributions include: 1. **Time-Varying Textual Inversion**: This module shifts the focus of text embeddings from texture to structure as the timestep increases, allowing for precise style representation. 2. **Bias-Reduced Stylization**: A technique that uses partially noisy mel-spectrograms to guide the stylization process, reducing bias and improving stability. 3. **Experimental Results**: The method outperforms existing approaches in both qualitative and quantitative evaluations, demonstrating high-quality style transfer with minimal reference data. The paper also discusses related work in music style transfer and text-to-music generation, highlighting the limitations of current methods. The authors conducted experiments using a small-scale dataset and compared their method with state-of-the-art approaches, showing superior performance in content preservation and style fit. A user study further validated the effectiveness of the proposed method.This paper presents a novel approach for music style transfer using time-varying textual inversion and diffusion models. The method aims to capture musical attributes with minimal data, enabling the transfer of styles from specific instruments, natural sounds, and synthesized sound effects to arbitrary content music. The key contributions include: 1. **Time-Varying Textual Inversion**: This module shifts the focus of text embeddings from texture to structure as the timestep increases, allowing for precise style representation. 2. **Bias-Reduced Stylization**: A technique that uses partially noisy mel-spectrograms to guide the stylization process, reducing bias and improving stability. 3. **Experimental Results**: The method outperforms existing approaches in both qualitative and quantitative evaluations, demonstrating high-quality style transfer with minimal reference data. The paper also discusses related work in music style transfer and text-to-music generation, highlighting the limitations of current methods. The authors conducted experiments using a small-scale dataset and compared their method with state-of-the-art approaches, showing superior performance in content preservation and style fit. A user study further validated the effectiveness of the proposed method.

Music Style Transfer with Time-Varying Inversion of Diffusion Models

2024 | Sifei Li, Yuxin Zhang, Fan Tang, Chongyang Ma, Weiming Dong, Changsheng Xu