19 Mar 2024 | Kartik Narayan*, Vibashan VS*, Rama Chellappa, and Vishal M. Patel
FaceXFormer is a unified transformer model designed for multiple facial analysis tasks, including face parsing, landmark detection, head pose estimation, attribute recognition, age/gender/race estimation, and landmarks visibility prediction. Unlike conventional methods that rely on task-specific designs, FaceXFormer uses a transformer-based encoder-decoder architecture where each task is treated as a learnable token, enabling the integration of multiple tasks within a single framework. The model also features a parameter-efficient decoder, FaceX, which processes both face and task tokens together, allowing the model to learn robust and generalized face representations across various tasks. FaceXFormer achieves state-of-the-art performance on multiple benchmarks and demonstrates robustness and generalizability across eight different tasks while maintaining real-time performance of 37 FPS. The model is trained on ten datasets with task-specific annotations and evaluated on both intra-dataset and cross-dataset benchmarks. FaceXFormer outperforms existing multi-task models and specialized models in several tasks, showing competitive performance in face parsing, attributes recognition, and head pose estimation. The model also demonstrates strong performance in handling images "in-the-wild" and is capable of performing multiple facial analysis tasks simultaneously. The model's ability to handle diverse tasks within a single framework highlights its versatility and effectiveness in facial analysis applications. The model's parameter-efficient decoder and multi-scale encoder contribute to its efficiency and performance. FaceXFormer is a lightweight model that provides real-time output based on task-specific queries and can be integrated with existing face detection systems to provide additional insights. The model is also suitable for surveillance and can provide auxiliary information for subject analysis and image retrieval. The model's performance is evaluated across various tasks, including face parsing, landmark detection, head pose estimation, attribute recognition, age/gender/race estimation, and landmarks visibility prediction. The model's performance is compared with other models, showing that it achieves state-of-the-art results in several tasks. The model's ability to handle multiple tasks within a single framework is a significant contribution to the field of facial analysis. The model's parameter-efficient decoder and multi-scale encoder contribute to its efficiency and performance. FaceXFormer is a lightweight model that provides real-time output based on task-specific queries and can be integrated with existing face detection systems to provide additional insights. The model is also suitable for surveillance and can provide auxiliary information for subject analysis and image retrieval. The model's performance is evaluated across various tasks, including face parsing, landmark detection, head pose estimation, attribute recognition, age/gender/race estimation, and landmarks visibility prediction. The model's performance is compared with other models, showing that it achieves state-of-the-art results in several tasks. The model's ability to handle multiple tasks within a single framework is a significant contribution to the field of facial analysis.FaceXFormer is a unified transformer model designed for multiple facial analysis tasks, including face parsing, landmark detection, head pose estimation, attribute recognition, age/gender/race estimation, and landmarks visibility prediction. Unlike conventional methods that rely on task-specific designs, FaceXFormer uses a transformer-based encoder-decoder architecture where each task is treated as a learnable token, enabling the integration of multiple tasks within a single framework. The model also features a parameter-efficient decoder, FaceX, which processes both face and task tokens together, allowing the model to learn robust and generalized face representations across various tasks. FaceXFormer achieves state-of-the-art performance on multiple benchmarks and demonstrates robustness and generalizability across eight different tasks while maintaining real-time performance of 37 FPS. The model is trained on ten datasets with task-specific annotations and evaluated on both intra-dataset and cross-dataset benchmarks. FaceXFormer outperforms existing multi-task models and specialized models in several tasks, showing competitive performance in face parsing, attributes recognition, and head pose estimation. The model also demonstrates strong performance in handling images "in-the-wild" and is capable of performing multiple facial analysis tasks simultaneously. The model's ability to handle diverse tasks within a single framework highlights its versatility and effectiveness in facial analysis applications. The model's parameter-efficient decoder and multi-scale encoder contribute to its efficiency and performance. FaceXFormer is a lightweight model that provides real-time output based on task-specific queries and can be integrated with existing face detection systems to provide additional insights. The model is also suitable for surveillance and can provide auxiliary information for subject analysis and image retrieval. The model's performance is evaluated across various tasks, including face parsing, landmark detection, head pose estimation, attribute recognition, age/gender/race estimation, and landmarks visibility prediction. The model's performance is compared with other models, showing that it achieves state-of-the-art results in several tasks. The model's ability to handle multiple tasks within a single framework is a significant contribution to the field of facial analysis. The model's parameter-efficient decoder and multi-scale encoder contribute to its efficiency and performance. FaceXFormer is a lightweight model that provides real-time output based on task-specific queries and can be integrated with existing face detection systems to provide additional insights. The model is also suitable for surveillance and can provide auxiliary information for subject analysis and image retrieval. The model's performance is evaluated across various tasks, including face parsing, landmark detection, head pose estimation, attribute recognition, age/gender/race estimation, and landmarks visibility prediction. The model's performance is compared with other models, showing that it achieves state-of-the-art results in several tasks. The model's ability to handle multiple tasks within a single framework is a significant contribution to the field of facial analysis.