Feedback Efficient Online Fine-Tuning of Diffusion Models

Feedback Efficient Online Fine-Tuning of Diffusion Models

2024 | Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Sergey Levine, Tommaso Biancalani
This paper introduces a novel feedback-efficient method for fine-tuning diffusion models. The proposed approach, SEIKO, efficiently explores the feasible space of diffusion models by integrating reward learning and diffusion model updates in an iterative manner. It incorporates KL regularization and an optimistic bonus term to facilitate exploration while staying within the feasible manifold. The method is theoretically analyzed, providing a regret guarantee, and validated across three domains: images, biological sequences, and molecules. The algorithm efficiently collects feedback, minimizing the number of queries to the true reward function while generating high-reward, novel samples. The key contributions include a provably feedback-efficient method for online fine-tuning of diffusion models, a novel approach that interleaves reward learning with diffusion model updates, and the integration of an uncertainty model and KL regularization to ensure exploration remains within the feasible space. The method outperforms existing baselines in terms of feedback efficiency and performance across various domains. The results demonstrate that SEIKO achieves high rewards with a fixed feedback budget, highlighting its effectiveness in generating high-quality samples in complex, high-dimensional spaces.This paper introduces a novel feedback-efficient method for fine-tuning diffusion models. The proposed approach, SEIKO, efficiently explores the feasible space of diffusion models by integrating reward learning and diffusion model updates in an iterative manner. It incorporates KL regularization and an optimistic bonus term to facilitate exploration while staying within the feasible manifold. The method is theoretically analyzed, providing a regret guarantee, and validated across three domains: images, biological sequences, and molecules. The algorithm efficiently collects feedback, minimizing the number of queries to the true reward function while generating high-reward, novel samples. The key contributions include a provably feedback-efficient method for online fine-tuning of diffusion models, a novel approach that interleaves reward learning with diffusion model updates, and the integration of an uncertainty model and KL regularization to ensure exploration remains within the feasible space. The method outperforms existing baselines in terms of feedback efficiency and performance across various domains. The results demonstrate that SEIKO achieves high rewards with a fixed feedback budget, highlighting its effectiveness in generating high-quality samples in complex, high-dimensional spaces.
Reach us at info@study.space