Improving Text-to-Image Consistency via Automatic Prompt Optimization

Improving Text-to-Image Consistency via Automatic Prompt Optimization

March 27, 2024 | Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal
The paper introduces OPT2I, a novel framework for improving text-to-image (T2I) model performance by optimizing prompts. OPT2I leverages a large language model (LLM) to iteratively generate revised prompts that maximize a consistency score, enhancing the alignment between the input prompt and the generated image. The framework starts with a user-provided prompt and uses the LLM to suggest alternative prompts that better capture the user's intent. This process is repeated until the desired number of iterations or a target consistency score is achieved. The framework is designed to be versatile, working with various T2I models and LLMs without requiring parameter updates. Extensive experiments on the MSCOCO and PartiPrompts datasets show that OPT2I can boost the initial consistency score by up to 24.9% while preserving the Fréchet Inception Distance (FID) and increasing recall between generated and real data. The paper also discusses the trade-offs between image quality, diversity, and prompt-image consistency, and provides qualitative examples to illustrate the effectiveness of OPT2I.The paper introduces OPT2I, a novel framework for improving text-to-image (T2I) model performance by optimizing prompts. OPT2I leverages a large language model (LLM) to iteratively generate revised prompts that maximize a consistency score, enhancing the alignment between the input prompt and the generated image. The framework starts with a user-provided prompt and uses the LLM to suggest alternative prompts that better capture the user's intent. This process is repeated until the desired number of iterations or a target consistency score is achieved. The framework is designed to be versatile, working with various T2I models and LLMs without requiring parameter updates. Extensive experiments on the MSCOCO and PartiPrompts datasets show that OPT2I can boost the initial consistency score by up to 24.9% while preserving the Fréchet Inception Distance (FID) and increasing recall between generated and real data. The paper also discusses the trade-offs between image quality, diversity, and prompt-image consistency, and provides qualitative examples to illustrate the effectiveness of OPT2I.
Reach us at info@study.space
Understanding Improving Text-to-Image Consistency via Automatic Prompt Optimization