Descriptive Image Quality Assessment in the Wild

Descriptive Image Quality Assessment in the Wild

12 Jun 2024 | Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong, Tianfan Xue
This paper introduces DepictQA-Wild, a Vision Language Model (VLM)-based Image Quality Assessment (IQA) model that addresses the limitations of existing methods in practical applications. Current IQA methods are limited in functionality and performance due to narrow focus on specific tasks, limited dataset coverage, and sub-optimal quality. To overcome these challenges, the authors propose a multi-functional IQA task paradigm that includes both assessment and comparison tasks, brief and detailed responses, and full-reference and non-reference scenarios. They also introduce a ground-truth-informed dataset construction approach to enhance data quality and scale the dataset to 495K under a brief-detail joint framework, resulting in a comprehensive, large-scale, and high-quality dataset named DQ-495K. The model retains image resolution during training to better handle resolution-related quality issues and estimates a confidence score to filter out low-quality responses. Experimental results show that DepictQA-Wild significantly outperforms traditional score-based methods, prior VLM-based IQA models, and proprietary GPT-4V in distortion identification, instant rating, and reasoning tasks. The model is further validated through real-world applications, including assessing web-downloaded images and ranking model-processed images. The dataset and code will be released in the project page. The paper also discusses related works, task paradigm, dataset construction, model design, and experiments, demonstrating the effectiveness of the proposed method in both benchmark and real-world applications.This paper introduces DepictQA-Wild, a Vision Language Model (VLM)-based Image Quality Assessment (IQA) model that addresses the limitations of existing methods in practical applications. Current IQA methods are limited in functionality and performance due to narrow focus on specific tasks, limited dataset coverage, and sub-optimal quality. To overcome these challenges, the authors propose a multi-functional IQA task paradigm that includes both assessment and comparison tasks, brief and detailed responses, and full-reference and non-reference scenarios. They also introduce a ground-truth-informed dataset construction approach to enhance data quality and scale the dataset to 495K under a brief-detail joint framework, resulting in a comprehensive, large-scale, and high-quality dataset named DQ-495K. The model retains image resolution during training to better handle resolution-related quality issues and estimates a confidence score to filter out low-quality responses. Experimental results show that DepictQA-Wild significantly outperforms traditional score-based methods, prior VLM-based IQA models, and proprietary GPT-4V in distortion identification, instant rating, and reasoning tasks. The model is further validated through real-world applications, including assessing web-downloaded images and ranking model-processed images. The dataset and code will be released in the project page. The paper also discusses related works, task paradigm, dataset construction, model design, and experiments, demonstrating the effectiveness of the proposed method in both benchmark and real-world applications.
Reach us at info@study.space