Arc2Face: A Foundation Model for ID-Consistent Human Faces

Arc2Face: A Foundation Model for ID-Consistent Human Faces

22 Aug 2024 | Foivos Paraperas Papantoniou, Alexandros Lattas, Stylianos Moschoglou, Jiankang Deng, Bernhard Kainz, and Stefanos Zafeiriou
Arc2Face is a foundation model for generating identity-consistent human faces, leveraging ArcFace embeddings to produce photo-realistic images with high similarity. The model is based on a pre-trained Stable Diffusion model, adapted to generate images conditioned solely on identity vectors. Unlike previous methods that combine identity with text embeddings, Arc2Face uses the compact and powerful features of face recognition (FR) models, which capture the essence of the human face. This approach allows for robust generation of images without relying on text descriptions, making it suitable for tasks where identity consistency is crucial. Arc2Face is trained on a large-scale dataset derived from WebFace42M, which is upscaled to ensure sufficient identity variation. The model is evaluated on various benchmarks, demonstrating superior performance compared to existing methods. It is also used to train a FR model on synthetic images, achieving better results than existing synthetic datasets. Arc2Face can be combined with ControlNet for spatial control of the output, enabling pose and expression manipulation. The model is effective in generating diverse and realistic images, maintaining identity consistency across different conditions. The study highlights the potential of ID-embeddings in face generation, emphasizing their superiority over CLIP image or text features. Arc2Face is a significant advancement in the field of face generation, offering a robust and efficient solution for identity-consistent image synthesis.Arc2Face is a foundation model for generating identity-consistent human faces, leveraging ArcFace embeddings to produce photo-realistic images with high similarity. The model is based on a pre-trained Stable Diffusion model, adapted to generate images conditioned solely on identity vectors. Unlike previous methods that combine identity with text embeddings, Arc2Face uses the compact and powerful features of face recognition (FR) models, which capture the essence of the human face. This approach allows for robust generation of images without relying on text descriptions, making it suitable for tasks where identity consistency is crucial. Arc2Face is trained on a large-scale dataset derived from WebFace42M, which is upscaled to ensure sufficient identity variation. The model is evaluated on various benchmarks, demonstrating superior performance compared to existing methods. It is also used to train a FR model on synthetic images, achieving better results than existing synthetic datasets. Arc2Face can be combined with ControlNet for spatial control of the output, enabling pose and expression manipulation. The model is effective in generating diverse and realistic images, maintaining identity consistency across different conditions. The study highlights the potential of ID-embeddings in face generation, emphasizing their superiority over CLIP image or text features. Arc2Face is a significant advancement in the field of face generation, offering a robust and efficient solution for identity-consistent image synthesis.
Reach us at info@study.space