[slides] Large-scale Reinforcement Learning for Diffusion Models

This paper addresses the limitations of text-to-image diffusion models, which are prone to biases and inaccuracies in image generation. The authors propose a scalable Reinforcement Learning (RL) framework to improve these models across various reward functions, including human preference, compositionality, and fairness. The method is designed to work with a large number of prompts and arbitrary objective functions, significantly outperforming existing methods in aligning diffusion models with human preferences. The approach is demonstrated to enhance the Stable Diffusion (SD) model, generating samples that are preferred by humans 80.3% of the time compared to the base SD model. The paper also introduces a distribution-based reward function to improve output diversity and discusses multi-objective RL training, showing how to optimize for multiple criteria simultaneously. Extensive experiments and comparisons with baseline methods validate the effectiveness of the proposed approach.This paper addresses the limitations of text-to-image diffusion models, which are prone to biases and inaccuracies in image generation. The authors propose a scalable Reinforcement Learning (RL) framework to improve these models across various reward functions, including human preference, compositionality, and fairness. The method is designed to work with a large number of prompts and arbitrary objective functions, significantly outperforming existing methods in aligning diffusion models with human preferences. The approach is demonstrated to enhance the Stable Diffusion (SD) model, generating samples that are preferred by humans 80.3% of the time compared to the base SD model. The paper also introduces a distribution-based reward function to improve output diversity and discusses multi-objective RL training, showing how to optimize for multiple criteria simultaneously. Extensive experiments and comparisons with baseline methods validate the effectiveness of the proposed approach.

Large-scale Reinforcement Learning for Diffusion Models

20 Jan 2024 | Yinan Zhang, Eric Tzeng, Yilun Du, Dmitry Kislyuk