[slides] A-Bench%3A Are LMMs Masters at Evaluating AI-generated Images%3F

The paper introduces A-Bench, a benchmark designed to evaluate the capabilities of large multi-modal models (LMMs) in assessing AI-generated images (AIGIs). The benchmark aims to address the limitations of traditional benchmarks, which often use mostly natural-captured content rather than AIGIs. A-Bench is structured around two key principles: emphasizing both high-level semantic understanding and low-level visual quality perception. It includes 2,864 AIGIs from 16 text-to-image models, each paired with question-answers annotated by human experts, and tested across 18 leading LMMs. The results show that LMMs still fall short of human performance, highlighting the need for further development to improve their evaluation capabilities. The benchmark is available at https://github.com/Q-Future/A-Bench.The paper introduces A-Bench, a benchmark designed to evaluate the capabilities of large multi-modal models (LMMs) in assessing AI-generated images (AIGIs). The benchmark aims to address the limitations of traditional benchmarks, which often use mostly natural-captured content rather than AIGIs. A-Bench is structured around two key principles: emphasizing both high-level semantic understanding and low-level visual quality perception. It includes 2,864 AIGIs from 16 text-to-image models, each paired with question-answers annotated by human experts, and tested across 18 leading LMMs. The results show that LMMs still fall short of human performance, highlighting the need for further development to improve their evaluation capabilities. The benchmark is available at https://github.com/Q-Future/A-Bench.

A-Bench: Are LMMs Masters at Evaluating AI-generated Images?

5 Jun 2024 | Zicheng Zhang, Haoning Wu, Chunyi Li, Yingjie Zhou, Wei Sun, Xiongkuo Min, Zijian Chen, Xiaohong Liu, Weisi Lin, Guangtao Zhai