Understanding MusicRL%3A Aligning Music Generation to Human Preferences

**MusicRL: Aligning Music Generation to Human Preferences** Google DeepMind **Introduction:** Music generation has seen significant advancements, with models now capable of handling open-ended, high-fidelity text-controlled music generation. However, the subjective nature of musicality and the specific intentions behind captions pose challenges for supervised training. To address this, MusicRL is introduced, a music generation system finetuned from human feedback. MusicRL is based on the MusicLM model, which is fine-tuned using reinforcement learning to maximize sequence-level rewards related to text adherence and audio quality. A dataset of 300,000 pairwise user preferences is collected and used to train a reward model. The resulting MusicRL-U model is preferred over the baseline in human evaluations. **Method:** MusicRL-R is finetuned on quality and text adherence rewards, while MusicRL-U is finetuned on user preferences. MusicRL-RU combines both approaches, outperforming all alternatives. Ablation studies show that text adherence and quality only account for part of the preference, highlighting the complexity of musical appreciation. **Results:** MusicRL-R and MusicRL-U are preferred over the baseline in human evaluations, with MusicRL-RU being the best performing model. The study demonstrates that integrating human feedback can significantly improve music generation models. **Conclusion:** MusicRL is the first music generation system that aligns with human preferences through reinforcement learning from human feedback. The findings highlight the importance of user feedback in improving music generation models, emphasizing the subjective and complex nature of musical appeal.**MusicRL: Aligning Music Generation to Human Preferences** Google DeepMind **Introduction:** Music generation has seen significant advancements, with models now capable of handling open-ended, high-fidelity text-controlled music generation. However, the subjective nature of musicality and the specific intentions behind captions pose challenges for supervised training. To address this, MusicRL is introduced, a music generation system finetuned from human feedback. MusicRL is based on the MusicLM model, which is fine-tuned using reinforcement learning to maximize sequence-level rewards related to text adherence and audio quality. A dataset of 300,000 pairwise user preferences is collected and used to train a reward model. The resulting MusicRL-U model is preferred over the baseline in human evaluations. **Method:** MusicRL-R is finetuned on quality and text adherence rewards, while MusicRL-U is finetuned on user preferences. MusicRL-RU combines both approaches, outperforming all alternatives. Ablation studies show that text adherence and quality only account for part of the preference, highlighting the complexity of musical appreciation. **Results:** MusicRL-R and MusicRL-U are preferred over the baseline in human evaluations, with MusicRL-RU being the best performing model. The study demonstrates that integrating human feedback can significantly improve music generation models. **Conclusion:** MusicRL is the first music generation system that aligns with human preferences through reinforcement learning from human feedback. The findings highlight the importance of user feedback in improving music generation models, emphasizing the subjective and complex nature of musical appeal.

MusicRL: Aligning Music Generation to Human Preferences

2024-2-7 | Geoffrey Cideron, Sertan Girgin, Mauro Verzetti, Damien Vincent, Matej Kastelic, Zalán Borsos, Brian McWilliams, Victor Ungureanu, Olivier Bachem, Olivier Pietquin, Matthieu Geist, Léonard Hussenot, Neil Zeghidour and Andrea Agostinelli