[slides] Descriptive Image Quality Assessment in the Wild

The paper introduces DepictQA-Wild, a Vision Language Model (VLM)-based Image Quality Assessment (IQA) model designed to align with human perception and capture the multifaceted nature of IQA tasks. Current VLM-based IQA methods are limited in functionality and performance due to their narrow focus on specific sub-tasks and settings, as well as issues with dataset coverage, scale, and quality. To address these challenges, DepictQA-Wild employs a multi-functional IQA task paradigm that includes both assessment and comparison tasks, brief and detailed responses, and full-reference and non-reference scenarios. The authors construct a comprehensive, large-scale, and high-quality dataset, DQ-495K, by enhancing data quality through ground-truth-informed construction and scaling up the dataset to 495K samples. They retain image resolution during training to better handle resolution-related quality issues and estimate confidence scores to filter out low-quality responses. Experimental results demonstrate that DepictQA-Wild significantly outperforms traditional score-based methods, prior VLM-based IQA models, and proprietary GPT-4V in various tasks, including distortion identification, instant rating, and reasoning. The model's advantages are further confirmed by real-world applications, such as assessing web-downloaded images and ranking model-processed images. The paper also discusses related works, task paradigms, dataset construction, model design, and ablation studies, highlighting the limitations and future directions for the field.The paper introduces DepictQA-Wild, a Vision Language Model (VLM)-based Image Quality Assessment (IQA) model designed to align with human perception and capture the multifaceted nature of IQA tasks. Current VLM-based IQA methods are limited in functionality and performance due to their narrow focus on specific sub-tasks and settings, as well as issues with dataset coverage, scale, and quality. To address these challenges, DepictQA-Wild employs a multi-functional IQA task paradigm that includes both assessment and comparison tasks, brief and detailed responses, and full-reference and non-reference scenarios. The authors construct a comprehensive, large-scale, and high-quality dataset, DQ-495K, by enhancing data quality through ground-truth-informed construction and scaling up the dataset to 495K samples. They retain image resolution during training to better handle resolution-related quality issues and estimate confidence scores to filter out low-quality responses. Experimental results demonstrate that DepictQA-Wild significantly outperforms traditional score-based methods, prior VLM-based IQA models, and proprietary GPT-4V in various tasks, including distortion identification, instant rating, and reasoning. The model's advantages are further confirmed by real-world applications, such as assessing web-downloaded images and ranking model-processed images. The paper also discusses related works, task paradigms, dataset construction, model design, and ablation studies, highlighting the limitations and future directions for the field.

Descriptive Image Quality Assessment in the Wild

12 Jun 2024 | Zhiyuan You12 Jinjin Gu34 Zheyuan Li2 Xin Cai14 Kaiwen Zhu45 Chao Dong24† Tianfan Xue1†