10 Jun 2024 | Jiwoo Hong, Sayak Paul, Noah Lee, Kashif Rasul, James Thorne, Jongheon Jeong
This paper introduces margin-aware preference optimization (MaPO), a novel method for aligning text-to-image diffusion models without relying on a reference model. The authors address the issue of reference mismatch, where the preference data and the reference model have distinct distributions, which can hinder alignment. MaPO jointly maximizes the likelihood margin between preferred and dispreferred image sets and the likelihood of the preferred sets, enabling the model to learn general stylistic features and preferences. The method is evaluated on two new pairwise preference datasets, Pick-Style and Pick-Safety, which simulate scenarios of reference mismatch. Experiments show that MaPO significantly improves alignment on these datasets and outperforms existing methods like Diffusion-DPO and SFT. Additionally, MaPO is memory-efficient and reduces training time by 14.5%. The method is also effective in general preference alignment, outperforming 21 out of 25 state-of-the-art text-to-image diffusion models on the Imgsys benchmark. MaPO is shown to be effective in aligning with human preferences and improving the aesthetics of generated images. The method is also efficient in terms of memory and data requirements, making it suitable for various domain-specific preference data. The paper highlights the importance of addressing reference mismatch in preference optimization and demonstrates the effectiveness of MaPO in doing so.This paper introduces margin-aware preference optimization (MaPO), a novel method for aligning text-to-image diffusion models without relying on a reference model. The authors address the issue of reference mismatch, where the preference data and the reference model have distinct distributions, which can hinder alignment. MaPO jointly maximizes the likelihood margin between preferred and dispreferred image sets and the likelihood of the preferred sets, enabling the model to learn general stylistic features and preferences. The method is evaluated on two new pairwise preference datasets, Pick-Style and Pick-Safety, which simulate scenarios of reference mismatch. Experiments show that MaPO significantly improves alignment on these datasets and outperforms existing methods like Diffusion-DPO and SFT. Additionally, MaPO is memory-efficient and reduces training time by 14.5%. The method is also effective in general preference alignment, outperforming 21 out of 25 state-of-the-art text-to-image diffusion models on the Imgsys benchmark. MaPO is shown to be effective in aligning with human preferences and improving the aesthetics of generated images. The method is also efficient in terms of memory and data requirements, making it suitable for various domain-specific preference data. The paper highlights the importance of addressing reference mismatch in preference optimization and demonstrates the effectiveness of MaPO in doing so.