[slides and audio] Arc2Face%3A A Foundation Model for ID-Consistent Human Faces

Arc2Face is a foundation model designed to generate high-quality, photo-realistic images of any subject, conditioned solely on identity (ID) embeddings. The model leverages the ArcFace embedding, which is known for its ability to filter out pose, expression, and contextual information, making it suitable for face recognition tasks. By adapting a pre-trained Stable Diffusion model, Arc2Face can generate diverse and realistic images with a high degree of face similarity. The key innovation lies in projecting ArcFace embeddings into the CLIP latent space, allowing the model to condition on ID features without the need for text descriptions. This approach ensures that the generated images closely match the input ID, achieving superior performance compared to existing methods that combine text and image features. experiments demonstrate that Arc2Face outperforms state-of-the-art models in terms of ID similarity, diversity, and realism, both on synthetic and real datasets. Additionally, Arc2Face can be combined with spatial control techniques like ControlNet to generate images with controlled pose and expression. The model's effectiveness is further validated through its ability to improve face recognition performance when trained on synthetic datasets generated using Arc2Face. Overall, Arc2Face represents a significant advancement in the field of ID-conditioned face generation, offering robust and versatile capabilities for various applications.Arc2Face is a foundation model designed to generate high-quality, photo-realistic images of any subject, conditioned solely on identity (ID) embeddings. The model leverages the ArcFace embedding, which is known for its ability to filter out pose, expression, and contextual information, making it suitable for face recognition tasks. By adapting a pre-trained Stable Diffusion model, Arc2Face can generate diverse and realistic images with a high degree of face similarity. The key innovation lies in projecting ArcFace embeddings into the CLIP latent space, allowing the model to condition on ID features without the need for text descriptions. This approach ensures that the generated images closely match the input ID, achieving superior performance compared to existing methods that combine text and image features. experiments demonstrate that Arc2Face outperforms state-of-the-art models in terms of ID similarity, diversity, and realism, both on synthetic and real datasets. Additionally, Arc2Face can be combined with spatial control techniques like ControlNet to generate images with controlled pose and expression. The model's effectiveness is further validated through its ability to improve face recognition performance when trained on synthetic datasets generated using Arc2Face. Overall, Arc2Face represents a significant advancement in the field of ID-conditioned face generation, offering robust and versatile capabilities for various applications.

Arc2Face: A Foundation Model for ID-Consistent Human Faces

22 Aug 2024 | Foivos Paraperas Papantoniou1, Alexandros Lattas1, Stylianos Moschoglou1, Jiankang Deng1, Bernhard Kainz1,2, and Stefanos Zafeiriou1