Understanding Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

The paper addresses the challenge of assessing the quality of AI-generated images (AGIs) by focusing on the semantic content and coherence of the images. Traditional deep neural network (DNN)-based image quality assessment (IQA) models, which are effective for natural scene images, struggle with AGIs due to the semantic inaccuracies inherent in the generation process. To overcome this, the authors introduce a large Multi-modality model Assisted AI-Generated Image Quality Assessment (MA-AGIQA) framework. This framework leverages a pre-trained large multi-modality model (LMM) to extract fine-grained semantic features and integrates these features with quality-aware features extracted by a traditional DNN-based IQA model, specifically MANIQA. The LMM, mPLUG-Owl2, is guided by carefully designed text prompts to capture semantic information, which is then merged with MANIQA's features using a mixture of experts (MoE) structure. The experimental results on two AI-generated content datasets and two traditional IQA datasets demonstrate that MA-AGIQA achieves state-of-the-art performance and superior generalization capabilities, particularly in assessing the quality of AGIs where semantic content plays a crucial role. The contributions of the paper include a systematic analysis of the limitations of traditional DNN-based IQA models, the introduction of MA-AGIQA, and extensive evaluation showing its effectiveness and superior performance over existing methods.The paper addresses the challenge of assessing the quality of AI-generated images (AGIs) by focusing on the semantic content and coherence of the images. Traditional deep neural network (DNN)-based image quality assessment (IQA) models, which are effective for natural scene images, struggle with AGIs due to the semantic inaccuracies inherent in the generation process. To overcome this, the authors introduce a large Multi-modality model Assisted AI-Generated Image Quality Assessment (MA-AGIQA) framework. This framework leverages a pre-trained large multi-modality model (LMM) to extract fine-grained semantic features and integrates these features with quality-aware features extracted by a traditional DNN-based IQA model, specifically MANIQA. The LMM, mPLUG-Owl2, is guided by carefully designed text prompts to capture semantic information, which is then merged with MANIQA's features using a mixture of experts (MoE) structure. The experimental results on two AI-generated content datasets and two traditional IQA datasets demonstrate that MA-AGIQA achieves state-of-the-art performance and superior generalization capabilities, particularly in assessing the quality of AGIs where semantic content plays a crucial role. The contributions of the paper include a systematic analysis of the limitations of traditional DNN-based IQA models, the introduction of MA-AGIQA, and extensive evaluation showing its effectiveness and superior performance over existing methods.

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

October 28–November 1, 2024 | Puyi Wang, Wei Sun, Zicheng Zhang, Jun Jia, Yanwei Jiang, Zhichao Zhang, Xiongkuo Min, Guangtao Zhai