2AFC Prompting of Large Multimodal Models for Image Quality Assessment

2AFC Prompting of Large Multimodal Models for Image Quality Assessment

2 Feb 2024 | Hanwei Zhu*, Xiangjie Sui*, Baoliang Chen, Xuelin Liu, Peilin Chen, Yuming Fang, Senior Member, IEEE, and Shiqi Wang, Senior Member, IEEE
This paper explores the image quality assessment (IQA) capabilities of large multimodal models (LMMs) using the two-alternative forced choice (2AFC) method, which is considered the most reliable way to collect human opinions on visual quality. The authors introduce three evaluation criteria—consistency, accuracy, and correlation—to comprehensively assess the IQA performance of five LMMs. Extensive experiments on existing image quality datasets reveal that while LMMs generally struggle with IQA tasks, particularly in fine-grained quality discrimination, the proprietary model GPT-4V shows outstanding performance. The proposed dataset and methods will facilitate future research in developing more advanced LMMs for IQA tasks. The paper also includes a detailed methodology for coarse-to-fine pairing rules, maximum a posterior estimation, and evaluation criteria, along with experimental setups and results.This paper explores the image quality assessment (IQA) capabilities of large multimodal models (LMMs) using the two-alternative forced choice (2AFC) method, which is considered the most reliable way to collect human opinions on visual quality. The authors introduce three evaluation criteria—consistency, accuracy, and correlation—to comprehensively assess the IQA performance of five LMMs. Extensive experiments on existing image quality datasets reveal that while LMMs generally struggle with IQA tasks, particularly in fine-grained quality discrimination, the proprietary model GPT-4V shows outstanding performance. The proposed dataset and methods will facilitate future research in developing more advanced LMMs for IQA tasks. The paper also includes a detailed methodology for coarse-to-fine pairing rules, maximum a posterior estimation, and evaluation criteria, along with experimental setups and results.
Reach us at info@study.space
Understanding 2AFC Prompting of Large Multimodal Models for Image Quality Assessment