Improving Text-to-Image Consistency via Automatic Prompt Optimization

Improving Text-to-Image Consistency via Automatic Prompt Optimization

March 27, 2024 | Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal
This paper introduces OPT2I, a text-to-image (T2I) optimization-by-prompting framework that improves prompt-image consistency without requiring model fine-tuning or parameter updates. The framework leverages a large language model (LLM) to iteratively refine user prompts, aiming to maximize a consistency score between the generated images and the input prompt. The LLM uses a history of prompt-score pairs to suggest revised prompts, which are then used to generate new images. The process continues until a maximum number of iterations is reached or a target consistency score is achieved. The framework is designed to be versatile, working with various T2I models, LLMs, and consistency metrics. The paper evaluates OPT2I on two datasets: MSCOCO and PartiPrompts. Results show that OPT2I significantly improves prompt-image consistency, with up to 24.9% improvement in DSG score while maintaining FID and increasing recall between generated and real data. The framework outperforms paraphrasing baselines, including random paraphrasing and Promptist, and is robust to different LLMs and consistency metrics. Qualitative results show that optimized prompts often emphasize elements of the initial prompt that were not present in the generated images, either by providing additional details or reordering elements to highlight them. The paper also explores the trade-offs between prompt-image consistency, image quality, and diversity. While OPT2I improves consistency, it may reduce precision and increase recall. The framework is evaluated using metrics such as FID, precision, and recall, and is shown to maintain image quality while improving consistency. The paper also discusses the limitations of current consistency metrics, noting that they may not always accurately reflect the quality of generated images. Overall, the study highlights the effectiveness of OPT2I in improving T2I consistency and its potential for building more reliable and robust T2I systems.This paper introduces OPT2I, a text-to-image (T2I) optimization-by-prompting framework that improves prompt-image consistency without requiring model fine-tuning or parameter updates. The framework leverages a large language model (LLM) to iteratively refine user prompts, aiming to maximize a consistency score between the generated images and the input prompt. The LLM uses a history of prompt-score pairs to suggest revised prompts, which are then used to generate new images. The process continues until a maximum number of iterations is reached or a target consistency score is achieved. The framework is designed to be versatile, working with various T2I models, LLMs, and consistency metrics. The paper evaluates OPT2I on two datasets: MSCOCO and PartiPrompts. Results show that OPT2I significantly improves prompt-image consistency, with up to 24.9% improvement in DSG score while maintaining FID and increasing recall between generated and real data. The framework outperforms paraphrasing baselines, including random paraphrasing and Promptist, and is robust to different LLMs and consistency metrics. Qualitative results show that optimized prompts often emphasize elements of the initial prompt that were not present in the generated images, either by providing additional details or reordering elements to highlight them. The paper also explores the trade-offs between prompt-image consistency, image quality, and diversity. While OPT2I improves consistency, it may reduce precision and increase recall. The framework is evaluated using metrics such as FID, precision, and recall, and is shown to maintain image quality while improving consistency. The paper also discusses the limitations of current consistency metrics, noting that they may not always accurately reflect the quality of generated images. Overall, the study highlights the effectiveness of OPT2I in improving T2I consistency and its potential for building more reliable and robust T2I systems.
Reach us at info@study.space