10 Jun 2024 | Yi Gu1.*, Zhendong Wang1,2.*, Yueqin Yin1, Yujia Xie2, and Mingyuan Zhou1.*
Diffusion-RPO is a novel method designed to align Text-to-Image (T2I) models with human preferences by optimizing diffusion model sampling steps and applying contrastive weighting to similar prompt-image pairs. The approach leverages both identical and semantically related prompt-image pairs across different modalities. The authors introduce a new evaluation metric called Style Alignment, which aims to overcome the limitations of current human preference alignment metrics, such as high costs, low reproducibility, and limited interpretability. Experiments on Stable Diffusion 1.5 and XL-1.0 models demonstrate that Diffusion-RPO outperforms existing methods in both automated evaluations of human preference and style alignment tasks. The method enhances the alignment of generated images with human preferences and shows superior performance in style alignment tasks, highlighting its effectiveness in fine-tuning diffusion-based T2I models.Diffusion-RPO is a novel method designed to align Text-to-Image (T2I) models with human preferences by optimizing diffusion model sampling steps and applying contrastive weighting to similar prompt-image pairs. The approach leverages both identical and semantically related prompt-image pairs across different modalities. The authors introduce a new evaluation metric called Style Alignment, which aims to overcome the limitations of current human preference alignment metrics, such as high costs, low reproducibility, and limited interpretability. Experiments on Stable Diffusion 1.5 and XL-1.0 models demonstrate that Diffusion-RPO outperforms existing methods in both automated evaluations of human preference and style alignment tasks. The method enhances the alignment of generated images with human preferences and shows superior performance in style alignment tasks, highlighting its effectiveness in fine-tuning diffusion-based T2I models.