27 Feb 2024 | Daiqing Li, Aleks Kamko, Ehsan Akhgari, Ali Sabet, Linmiao Xu, Suhail Doshi
This paper presents three key insights to enhance the aesthetic quality of text-to-image generative models, focusing on Playground v2.5. The authors address three critical aspects: improving color and contrast, generating images across multiple aspect ratios, and aligning model outputs with human preferences. They demonstrate significant improvements in these areas through extensive analysis and experiments. Playground v2.5 outperforms both open-source models like SDXL and closed-source systems like DALL-E 3 and Midjourney v5.2 in terms of aesthetic quality. The model is open-sourced and available on HuggingFace, with extensions for use in popular community tools like A1111 and ComfyUI. The paper also introduces a new automatic evaluation benchmark, MJHQ-30K, to assess the model's performance. Overall, Playground v2.5 aims to provide a leading solution for text-to-image generation, focusing on realism, visual fidelity, and human-centric details.This paper presents three key insights to enhance the aesthetic quality of text-to-image generative models, focusing on Playground v2.5. The authors address three critical aspects: improving color and contrast, generating images across multiple aspect ratios, and aligning model outputs with human preferences. They demonstrate significant improvements in these areas through extensive analysis and experiments. Playground v2.5 outperforms both open-source models like SDXL and closed-source systems like DALL-E 3 and Midjourney v5.2 in terms of aesthetic quality. The model is open-sourced and available on HuggingFace, with extensions for use in popular community tools like A1111 and ComfyUI. The paper also introduces a new automatic evaluation benchmark, MJHQ-30K, to assess the model's performance. Overall, Playground v2.5 aims to provide a leading solution for text-to-image generation, focusing on realism, visual fidelity, and human-centric details.