Descriptive Image Quality Assessment in the Wild

Descriptive Image Quality Assessment in the Wild

12 Jun 2024 | Zhiyuan You12 Jinjin Gu34 Zheyuan Li2 Xin Cai14 Kaiwen Zhu45 Chao Dong24† Tianfan Xue1†
The paper introduces DepictQA-Wild, a Vision Language Model (VLM)-based Image Quality Assessment (IQA) model designed to align with human perception and capture the multifaceted nature of IQA tasks. Current VLM-based IQA methods are limited in functionality and performance due to their narrow focus on specific sub-tasks and settings, as well as issues with dataset coverage, scale, and quality. To address these challenges, DepictQA-Wild employs a multi-functional IQA task paradigm that includes both assessment and comparison tasks, brief and detailed responses, and full-reference and non-reference scenarios. The authors construct a comprehensive, large-scale, and high-quality dataset, DQ-495K, by enhancing data quality through ground-truth-informed construction and scaling up the dataset to 495K samples. They retain image resolution during training to better handle resolution-related quality issues and estimate confidence scores to filter out low-quality responses. Experimental results demonstrate that DepictQA-Wild significantly outperforms traditional score-based methods, prior VLM-based IQA models, and proprietary GPT-4V in various tasks, including distortion identification, instant rating, and reasoning. The model's advantages are further confirmed by real-world applications, such as assessing web-downloaded images and ranking model-processed images. The paper also discusses related works, task paradigms, dataset construction, model design, and ablation studies, highlighting the limitations and future directions for the field.The paper introduces DepictQA-Wild, a Vision Language Model (VLM)-based Image Quality Assessment (IQA) model designed to align with human perception and capture the multifaceted nature of IQA tasks. Current VLM-based IQA methods are limited in functionality and performance due to their narrow focus on specific sub-tasks and settings, as well as issues with dataset coverage, scale, and quality. To address these challenges, DepictQA-Wild employs a multi-functional IQA task paradigm that includes both assessment and comparison tasks, brief and detailed responses, and full-reference and non-reference scenarios. The authors construct a comprehensive, large-scale, and high-quality dataset, DQ-495K, by enhancing data quality through ground-truth-informed construction and scaling up the dataset to 495K samples. They retain image resolution during training to better handle resolution-related quality issues and estimate confidence scores to filter out low-quality responses. Experimental results demonstrate that DepictQA-Wild significantly outperforms traditional score-based methods, prior VLM-based IQA models, and proprietary GPT-4V in various tasks, including distortion identification, instant rating, and reasoning. The model's advantages are further confirmed by real-world applications, such as assessing web-downloaded images and ranking model-processed images. The paper also discusses related works, task paradigms, dataset construction, model design, and ablation studies, highlighting the limitations and future directions for the field.
Reach us at info@study.space