Aligning Diffusion Models by Optimizing Human Utility

Aligning Diffusion Models by Optimizing Human Utility

11 Oct 2024 | Shufan Li, Konstantinos Kallidromitis, Akash Gokul, Yusuke Kato, Kazuki Kozuka
The paper introduces Diffusion-KTO, a novel approach to align text-to-image (T2I) diffusion models with human preferences using a utility maximization framework. Unlike previous methods that require pairwise preference data or complex reward models, Diffusion-KTO uses per-image binary feedback signals (likes or dislikes) to align the model. This approach bypasses the need for expensive pairwise preference data and avoids the training of a reward model. The method extends the utility maximization framework from large language models (LLMs) to diffusion models, allowing for alignment from per-image binary feedback. The authors evaluate Diffusion-KTO using Stable Diffusion v1-5 (SD v1-5) fine-tuned on the Pick-a-Pic v2 dataset and demonstrate significant improvements in alignment compared to existing methods, as judged by both human evaluators and automated metrics. The paper also explores various utility functions and provides ablation studies to understand the effectiveness of different components of the method. Overall, Diffusion-KTO offers a robust framework for aligning T2I models with human preferences, leveraging readily available per-image binary preference signals.The paper introduces Diffusion-KTO, a novel approach to align text-to-image (T2I) diffusion models with human preferences using a utility maximization framework. Unlike previous methods that require pairwise preference data or complex reward models, Diffusion-KTO uses per-image binary feedback signals (likes or dislikes) to align the model. This approach bypasses the need for expensive pairwise preference data and avoids the training of a reward model. The method extends the utility maximization framework from large language models (LLMs) to diffusion models, allowing for alignment from per-image binary feedback. The authors evaluate Diffusion-KTO using Stable Diffusion v1-5 (SD v1-5) fine-tuned on the Pick-a-Pic v2 dataset and demonstrate significant improvements in alignment compared to existing methods, as judged by both human evaluators and automated metrics. The paper also explores various utility functions and provides ablation studies to understand the effectiveness of different components of the method. Overall, Diffusion-KTO offers a robust framework for aligning T2I models with human preferences, leveraging readily available per-image binary preference signals.
Reach us at info@study.space