Feedback Efficient Online Fine-Tuning of Diffusion Models

Feedback Efficient Online Fine-Tuning of Diffusion Models

2024 | Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Sergey Levine, Tommaso Biancalani
This paper introduces a novel reinforcement learning (RL) procedure for efficiently fine-tuning diffusion models to maximize specific properties, such as aesthetic quality in images or bioactivity in molecules. The challenge lies in the high-dimensional design space and the need to explore only the feasible region, which is often a low-dimensional manifold. The proposed method, called SEIKO (Optimistic Fintuning of Diffusion models with KL constraint), interleaves reward learning and diffusion model updates, incorporating an uncertainty oracle to guide exploration. The algorithm is theoretically analyzed with a regret guarantee and validated across three domains: images, biological sequences, and molecules. Experimental results show that SEIKO outperforms existing methods in terms of both reward and diversity, demonstrating its effectiveness in feedback-efficient fine-tuning.This paper introduces a novel reinforcement learning (RL) procedure for efficiently fine-tuning diffusion models to maximize specific properties, such as aesthetic quality in images or bioactivity in molecules. The challenge lies in the high-dimensional design space and the need to explore only the feasible region, which is often a low-dimensional manifold. The proposed method, called SEIKO (Optimistic Fintuning of Diffusion models with KL constraint), interleaves reward learning and diffusion model updates, incorporating an uncertainty oracle to guide exploration. The algorithm is theoretically analyzed with a regret guarantee and validated across three domains: images, biological sequences, and molecules. Experimental results show that SEIKO outperforms existing methods in terms of both reward and diversity, demonstrating its effectiveness in feedback-efficient fine-tuning.
Reach us at info@study.space
[slides and audio] Feedback Efficient Online Fine-Tuning of Diffusion Models