18 Jun 2024 | Zhiqiu Lin1 Deepak Pathak1 Baiqi Li1 Jiayao Li1 Xide Xia2 Graham Neubig1 Pengchuan Zhang2* Deva Ramanan1*
The paper introduces VQAScore, a new metric for evaluating the alignment between generated images and text prompts. VQAScore uses a visual-question-answering (VQA) model to compute the probability of a "Yes" answer to a question like "Does this figure show [text]?" This approach is simpler than existing methods but still achieves state-of-the-art results across multiple image-text alignment benchmarks. The authors also introduce GenAI-Bench, a more challenging benchmark with 1,600 compositional text prompts that require advanced reasoning skills. GenAI-Bench includes human ratings for leading image and video generation models, providing a comprehensive evaluation of text-to-visual generation. VQAScore can also be applied to video and 3D models, showing superior performance over existing metrics. The paper highlights the limitations of current evaluation methods and proposes VQAScore as a robust alternative, along with GenAI-Bench, to advance the scientific evaluation of generative models.The paper introduces VQAScore, a new metric for evaluating the alignment between generated images and text prompts. VQAScore uses a visual-question-answering (VQA) model to compute the probability of a "Yes" answer to a question like "Does this figure show [text]?" This approach is simpler than existing methods but still achieves state-of-the-art results across multiple image-text alignment benchmarks. The authors also introduce GenAI-Bench, a more challenging benchmark with 1,600 compositional text prompts that require advanced reasoning skills. GenAI-Bench includes human ratings for leading image and video generation models, providing a comprehensive evaluation of text-to-visual generation. VQAScore can also be applied to video and 3D models, showing superior performance over existing metrics. The paper highlights the limitations of current evaluation methods and proposes VQAScore as a robust alternative, along with GenAI-Bench, to advance the scientific evaluation of generative models.