Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

3 Jun 2024 | Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli S Sastry, Siddharth Gururani, Sageev Oore, Yisong Yue
This paper introduces Stochastic Control Guidance (SCG), a novel method for symbolic music generation with non-differentiable rule guidance. The proposed approach enables plug-and-play guidance for pre-trained diffusion models, allowing them to generate samples that conform to specific musical rules without requiring differentiability. SCG is inspired by stochastic control theory, where the problem of generating samples that follow rule guidance is framed as an optimal control problem within a stochastic dynamical system. The method only requires forward evaluation of rule functions, making it applicable to non-differentiable rules. The paper also presents a latent diffusion architecture for symbolic music generation with high time resolution, which can be combined with SCG in a plug-and-play manner. This architecture enables the generation of dynamic music performances at 10ms time resolution, a significant challenge for standard pixel space diffusion models. The framework demonstrates state-of-the-art performance in various music generation tasks, offering superior rule guidance over popular methods and enabling musicians to effectively use it as a compositional tool. The paper evaluates the proposed method on a wide range of symbolic music generation tasks, including unconditional generation, individual rule guidance, composite rule guidance, and editing. The results show that SCG significantly improves the controllability of non-differentiable rules, achieving the lowest loss without the need for training any surrogate model. It also outperforms existing methods in terms of objective metrics such as overlapping area (OA) and subjective evaluation. The paper also discusses the theoretical connection between stochastic optimal control and guidance methods, showing that many popular guidance techniques can be viewed through the lens of stochastic optimal control. The proposed method is compatible with other diffusion model techniques like inpainting, outpainting, and editing, further enhancing its versatility in music generation. The results demonstrate that SCG is particularly beneficial for guiding the generation process with non-differentiable loss functions or achieving guidance without the need for additional training.This paper introduces Stochastic Control Guidance (SCG), a novel method for symbolic music generation with non-differentiable rule guidance. The proposed approach enables plug-and-play guidance for pre-trained diffusion models, allowing them to generate samples that conform to specific musical rules without requiring differentiability. SCG is inspired by stochastic control theory, where the problem of generating samples that follow rule guidance is framed as an optimal control problem within a stochastic dynamical system. The method only requires forward evaluation of rule functions, making it applicable to non-differentiable rules. The paper also presents a latent diffusion architecture for symbolic music generation with high time resolution, which can be combined with SCG in a plug-and-play manner. This architecture enables the generation of dynamic music performances at 10ms time resolution, a significant challenge for standard pixel space diffusion models. The framework demonstrates state-of-the-art performance in various music generation tasks, offering superior rule guidance over popular methods and enabling musicians to effectively use it as a compositional tool. The paper evaluates the proposed method on a wide range of symbolic music generation tasks, including unconditional generation, individual rule guidance, composite rule guidance, and editing. The results show that SCG significantly improves the controllability of non-differentiable rules, achieving the lowest loss without the need for training any surrogate model. It also outperforms existing methods in terms of objective metrics such as overlapping area (OA) and subjective evaluation. The paper also discusses the theoretical connection between stochastic optimal control and guidance methods, showing that many popular guidance techniques can be viewed through the lens of stochastic optimal control. The proposed method is compatible with other diffusion model techniques like inpainting, outpainting, and editing, further enhancing its versatility in music generation. The results demonstrate that SCG is particularly beneficial for guiding the generation process with non-differentiable loss functions or achieving guidance without the need for additional training.
Reach us at info@study.space
Understanding Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion