17 Apr 2024 | Nicolas Chahine, Marcos V. Conde, Daniela Carfora, Gabriel Pacianotto, Benoit Pochon, Sira Ferradans, Radu Timofte, Zhichao Duan, Xinrui Xu, Yipo Huang, Quan Yuan, Xiangfei Sheng, Zhichao Yang, Leida Li, Fangyuan Kong, Yifang Xu, Wei Sun, Yanwei Jiang, Haotian Fan, Zicheng Zhang, Jun Jia, Yingjie Zhou, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Xiaoqi Wang, Yun Zhang, Zewen Chen, Wen Wang, Bing Li, Zhongpeng Ji, Xingkuan Min, Weisi Lin, Zewen Chen
This paper reviews the NTIRE 2024 Portrait Quality Assessment Challenge, which aims to develop efficient deep neural networks capable of estimating the perceptual quality of real portrait photos under diverse conditions. The challenge required submissions to generalize to various scenes, lighting conditions, movement, and blur. Fourteen hundred participants registered, and thirty-five submitted results. The top five submissions are reviewed to gauge the current state-of-the-art in Portrait Quality Assessment.
The challenge focuses on evaluating the overall quality of portraits and their generalization to unseen conditions. The evaluation procedure consists of two phases: a preliminary testing phase using the public PIQ23 dataset and a final testing phase using a private dataset of 96 single-person scenes. The evaluation metrics include Spearman Rank Correlation Coefficient (SRCC), Pearson Linear Correlation Coefficient (PLCC), and Kendall Rank Correlation Coefficient (KRCC).
Several novel frameworks were proposed to address the challenges of domain shifts and generalization. These include:
1. **RQ-Net**: A method for robust cross-scene relative quality assessment using a global and local quality perception branch.
2. **Rank-based Vision Transformer Network**: An approach based on MSTRIQ, using a merged ranking loss and data augmentation.
3. **PQE**: A two-branch model that considers both facial and full image characteristics.
4. **MoNet**: A mean-opinion network that collects diverse opinions through multi-view attention learning.
5. **Scene Adaptive Global Context and Local Facial Perception Network**: A model that adaptively evaluates global and local quality based on scene classification.
The paper provides detailed descriptions of the methods, including their architectures, training procedures, and implementation details. It also includes performance metrics and comparisons, highlighting the strengths and limitations of each approach. The results show that while all methods struggle with generalization to new scenes, some models, such as RQ-Net and PQE, achieve better performance in specific scenarios.This paper reviews the NTIRE 2024 Portrait Quality Assessment Challenge, which aims to develop efficient deep neural networks capable of estimating the perceptual quality of real portrait photos under diverse conditions. The challenge required submissions to generalize to various scenes, lighting conditions, movement, and blur. Fourteen hundred participants registered, and thirty-five submitted results. The top five submissions are reviewed to gauge the current state-of-the-art in Portrait Quality Assessment.
The challenge focuses on evaluating the overall quality of portraits and their generalization to unseen conditions. The evaluation procedure consists of two phases: a preliminary testing phase using the public PIQ23 dataset and a final testing phase using a private dataset of 96 single-person scenes. The evaluation metrics include Spearman Rank Correlation Coefficient (SRCC), Pearson Linear Correlation Coefficient (PLCC), and Kendall Rank Correlation Coefficient (KRCC).
Several novel frameworks were proposed to address the challenges of domain shifts and generalization. These include:
1. **RQ-Net**: A method for robust cross-scene relative quality assessment using a global and local quality perception branch.
2. **Rank-based Vision Transformer Network**: An approach based on MSTRIQ, using a merged ranking loss and data augmentation.
3. **PQE**: A two-branch model that considers both facial and full image characteristics.
4. **MoNet**: A mean-opinion network that collects diverse opinions through multi-view attention learning.
5. **Scene Adaptive Global Context and Local Facial Perception Network**: A model that adaptively evaluates global and local quality based on scene classification.
The paper provides detailed descriptions of the methods, including their architectures, training procedures, and implementation details. It also includes performance metrics and comparisons, highlighting the strengths and limitations of each approach. The results show that while all methods struggle with generalization to new scenes, some models, such as RQ-Net and PQE, achieve better performance in specific scenarios.