9 Apr 2024 | Hatef Otroshi Shahrezaei, Christophe Ecabert, Anjith George, Alexander Unnervik, Sébastien Marcel, Nicolò Di Domenico, Guido Borghi, Davide Maltoni, Fadi Boutros, Julia Vogel, Naser Damer, Ángela Sánchez-Pérez, Enrique Mas-Candela, Jorge Calvo-Zaragoza, Bernardo Biesseck, Pedro Vidal, Roger Granada, David Menotti, Ivan DeAndres-Tame, Simone Maurizio La Cava, Sara Concas, Pietro Melzi, Ruben Tolosana, Ruben Vera-Rodriguez, Gianpaolo Perelli, Giulia Orrù, Gian Luca Marcialis, Julian Fierrez
The paper presents the Synthetic Data for Face Recognition (SDFR) Competition, held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024). The competition aimed to investigate the use of synthetic data for training face recognition models, addressing the legal, ethical, and privacy concerns associated with large-scale web-crawled datasets. The competition was divided into two tasks: Task 1 involved a fixed face recognition backbone and a limited dataset size, while Task 2 allowed more flexibility in model backbones, datasets, and training pipelines. Participants were encouraged to use new and existing synthetic datasets to train their models, with the goal of improving performance compared to baseline models trained on real and synthetic datasets.
The submitted models were evaluated on seven benchmarking datasets, including LFW, CFP-FP, CPLFW, AgeDB-30, CALFW, IJB-B, and IJB-C. The results showed a gap between models trained on real and synthetic datasets, with some teams achieving competitive performance using synthetic data. The BioLab team, for example, achieved significant performance on all datasets, even on challenging ones like IJB-B and IJB-C, by using a combination of synthetic datasets.
The paper also discusses the evaluation of submissions on the Racial Faces in-the-Wild (RFW) dataset to assess their performance across different demography groups. The results indicated that all top-performing submissions had better performance on the Caucasian group compared to the African group, highlighting the need for responsible synthetic dataset generation to mitigate bias.
Finally, the paper outlines the current state of research in synthetic data generation for face recognition and identifies open challenges, such as increasing inter-class and intra-class variations in synthetic datasets and scaling synthetic datasets to generate more images. The competition and the findings contribute to the ongoing efforts to develop privacy-friendly and high-performing face recognition models.The paper presents the Synthetic Data for Face Recognition (SDFR) Competition, held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024). The competition aimed to investigate the use of synthetic data for training face recognition models, addressing the legal, ethical, and privacy concerns associated with large-scale web-crawled datasets. The competition was divided into two tasks: Task 1 involved a fixed face recognition backbone and a limited dataset size, while Task 2 allowed more flexibility in model backbones, datasets, and training pipelines. Participants were encouraged to use new and existing synthetic datasets to train their models, with the goal of improving performance compared to baseline models trained on real and synthetic datasets.
The submitted models were evaluated on seven benchmarking datasets, including LFW, CFP-FP, CPLFW, AgeDB-30, CALFW, IJB-B, and IJB-C. The results showed a gap between models trained on real and synthetic datasets, with some teams achieving competitive performance using synthetic data. The BioLab team, for example, achieved significant performance on all datasets, even on challenging ones like IJB-B and IJB-C, by using a combination of synthetic datasets.
The paper also discusses the evaluation of submissions on the Racial Faces in-the-Wild (RFW) dataset to assess their performance across different demography groups. The results indicated that all top-performing submissions had better performance on the Caucasian group compared to the African group, highlighting the need for responsible synthetic dataset generation to mitigate bias.
Finally, the paper outlines the current state of research in synthetic data generation for face recognition and identifies open challenges, such as increasing inter-class and intra-class variations in synthetic datasets and scaling synthetic datasets to generate more images. The competition and the findings contribute to the ongoing efforts to develop privacy-friendly and high-performing face recognition models.