Understanding Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models%3A A Tutorial and Review

This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions. Diffusion models are powerful generative models capable of accurately capturing complex distributions. However, practical applications in domains like biology require generating samples that maximize specific metrics, such as translation efficiency in RNA or docking scores in molecules. These tasks can be addressed by optimizing diffusion models to explicitly maximize the desired metric, a process rooted in reinforcement learning (RL) concepts. The tutorial explains the application of various RL algorithms, including PPO, differentiable optimization, reward-weighted MLE, value-weighted sampling, and path consistency learning, tailored for diffusion model fine-tuning. It explores the strengths and limitations of these methods, their benefits compared to non-RL approaches, and their formal objectives. The tutorial also examines their connections with related topics such as classifier guidance, Gflownets, flow-based diffusion models, path integral control theory, and sampling from unnormalized distributions. The code for this tutorial is available at https://github.com/masa-ue/RLfinetuning_Diffusion_Bioseq. The content is divided into three parts, covering the fundamentals of diffusion models, RL-based fine-tuning algorithms, and their connections with related methods. The tutorial aims to provide a holistic understanding of RL-based fine-tuning, including its advantages, challenges, and practical applications in various domains.This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions. Diffusion models are powerful generative models capable of accurately capturing complex distributions. However, practical applications in domains like biology require generating samples that maximize specific metrics, such as translation efficiency in RNA or docking scores in molecules. These tasks can be addressed by optimizing diffusion models to explicitly maximize the desired metric, a process rooted in reinforcement learning (RL) concepts. The tutorial explains the application of various RL algorithms, including PPO, differentiable optimization, reward-weighted MLE, value-weighted sampling, and path consistency learning, tailored for diffusion model fine-tuning. It explores the strengths and limitations of these methods, their benefits compared to non-RL approaches, and their formal objectives. The tutorial also examines their connections with related topics such as classifier guidance, Gflownets, flow-based diffusion models, path integral control theory, and sampling from unnormalized distributions. The code for this tutorial is available at https://github.com/masa-ue/RLfinetuning_Diffusion_Bioseq. The content is divided into three parts, covering the fundamentals of diffusion models, RL-based fine-tuning algorithms, and their connections with related methods. The tutorial aims to provide a holistic understanding of RL-based fine-tuning, including its advantages, challenges, and practical applications in various domains.

Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review

July 19, 2024 | Masatoshi Uehara, Yulai Zhao, Tommaso Biancalani, and Sergey Levine