27 Feb 2024 | Ivan DeAndres-Tame, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia
This study explores the capabilities of ChatGPT, a large language model (LLM) based on the GPT-4 multimodal LLM, in face biometrics tasks such as face verification, soft-biometrics estimation, and explainability. The research aims to evaluate ChatGPT's performance and robustness using popular public benchmarks and compares it with state-of-the-art methods. Key findings include:
1. **Face Verification**: ChatGPT achieves around 94% accuracy in the LFW database, showing potential in controlled environments but struggles in more challenging scenarios like QUIS-CAMPI and TinyFaces. It also exhibits significant performance variations based on image quality, pose, and domain disparities.
2. **Soft-Biometrics Estimation**: ChatGPT performs well in estimating soft biometrics such as gender (≈96%), age (≈73%), and ethnicity (≈88%). It excels in certain attributes like gender classification and some ethnicities, despite lower average accuracy compared to specialized models.
3. **Explainability**: ChatGPT provides textual outputs that help explain its decisions, focusing on soft-biometric attributes and detailed facial features. However, it also includes biases, such as gender and racial biases, which need to be addressed.
The study concludes that while ChatGPT may not match the accuracy of specialized models, it offers a promising initial assessment tool for face biometrics tasks, particularly in zero-shot learning scenarios. Future work will focus on analyzing other popular chatbots for similar tasks.This study explores the capabilities of ChatGPT, a large language model (LLM) based on the GPT-4 multimodal LLM, in face biometrics tasks such as face verification, soft-biometrics estimation, and explainability. The research aims to evaluate ChatGPT's performance and robustness using popular public benchmarks and compares it with state-of-the-art methods. Key findings include:
1. **Face Verification**: ChatGPT achieves around 94% accuracy in the LFW database, showing potential in controlled environments but struggles in more challenging scenarios like QUIS-CAMPI and TinyFaces. It also exhibits significant performance variations based on image quality, pose, and domain disparities.
2. **Soft-Biometrics Estimation**: ChatGPT performs well in estimating soft biometrics such as gender (≈96%), age (≈73%), and ethnicity (≈88%). It excels in certain attributes like gender classification and some ethnicities, despite lower average accuracy compared to specialized models.
3. **Explainability**: ChatGPT provides textual outputs that help explain its decisions, focusing on soft-biometric attributes and detailed facial features. However, it also includes biases, such as gender and racial biases, which need to be addressed.
The study concludes that while ChatGPT may not match the accuracy of specialized models, it offers a promising initial assessment tool for face biometrics tasks, particularly in zero-shot learning scenarios. Future work will focus on analyzing other popular chatbots for similar tasks.