Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control

Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control

Under Review | Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, Sergey Levine
This paper introduces ELEGANT, a novel method for fine-tuning diffusion models with entropy regularization to address the issue of reward collapse. Diffusion models excel at capturing complex data distributions but can suffer from overoptimization when trained with imperfect reward functions. ELEGANT frames the fine-tuning process as an entropy-regularized control problem, optimizing both the reward function and maintaining diversity in generated samples. The method uses neural SDEs to directly optimize entropy-enhanced rewards, ensuring the generated samples remain close to the pre-trained distribution. Theoretical and empirical results show that ELEGANT effectively generates diverse samples with high genuine rewards, mitigating overoptimization. The approach is applied to both image generation and biological sequence generation, demonstrating its effectiveness across multiple domains. The method introduces a computationally efficient and theoretically grounded framework for fine-tuning diffusion models, outperforming existing techniques in terms of both theoretical support and empirical performance. The paper also discusses the importance of KL regularization in preventing overoptimization and provides a detailed analysis of the algorithm's components and performance on various tasks, including protein and DNA sequence generation and image generation.This paper introduces ELEGANT, a novel method for fine-tuning diffusion models with entropy regularization to address the issue of reward collapse. Diffusion models excel at capturing complex data distributions but can suffer from overoptimization when trained with imperfect reward functions. ELEGANT frames the fine-tuning process as an entropy-regularized control problem, optimizing both the reward function and maintaining diversity in generated samples. The method uses neural SDEs to directly optimize entropy-enhanced rewards, ensuring the generated samples remain close to the pre-trained distribution. Theoretical and empirical results show that ELEGANT effectively generates diverse samples with high genuine rewards, mitigating overoptimization. The approach is applied to both image generation and biological sequence generation, demonstrating its effectiveness across multiple domains. The method introduces a computationally efficient and theoretically grounded framework for fine-tuning diffusion models, outperforming existing techniques in terms of both theoretical support and empirical performance. The paper also discusses the importance of KL regularization in preventing overoptimization and provides a detailed analysis of the algorithm's components and performance on various tasks, including protein and DNA sequence generation and image generation.
Reach us at info@study.space
[slides and audio] Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control