Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control

Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control

28 Feb 2024 | Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, Sergey Levine
Diffusion models are effective tools for capturing complex data distributions, but they often require fine-tuning to optimize specific properties, such as aesthetic quality in image generation or bioactivity in biological sequence generation. However, this fine-tuning process can lead to "reward collapse," where the model overfits to the learned reward function, resulting in poor sample quality and reduced diversity. To address this issue, the paper proposes a novel approach called ELEGANT (finE-tuning doubleEntropy reGulArized coNTrol), which frames the fine-tuning problem as entropy-regularized control against a pre-trained diffusion model. This method optimizes a reward function while maintaining the diversity of samples and staying close to the pre-trained distribution. The paper provides theoretical and empirical evidence that demonstrates the effectiveness of ELEGANT in generating diverse samples with high genuine rewards, effectively mitigating reward collapse. The method is applied to both image generation and biological sequence generation, showing superior performance compared to existing techniques.Diffusion models are effective tools for capturing complex data distributions, but they often require fine-tuning to optimize specific properties, such as aesthetic quality in image generation or bioactivity in biological sequence generation. However, this fine-tuning process can lead to "reward collapse," where the model overfits to the learned reward function, resulting in poor sample quality and reduced diversity. To address this issue, the paper proposes a novel approach called ELEGANT (finE-tuning doubleEntropy reGulArized coNTrol), which frames the fine-tuning problem as entropy-regularized control against a pre-trained diffusion model. This method optimizes a reward function while maintaining the diversity of samples and staying close to the pre-trained distribution. The paper provides theoretical and empirical evidence that demonstrates the effectiveness of ELEGANT in generating diverse samples with high genuine rewards, effectively mitigating reward collapse. The method is applied to both image generation and biological sequence generation, showing superior performance compared to existing techniques.
Reach us at info@study.space