Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

9 Jul 2024 | Yue Han, Junwei Zhu, Keke He, Xu Chen, Yanhao Ge, Wei Li, Xiangtai Li, Jiangning Zhang, Chengjie Wang, Yong Liu
This paper introduces FaceAdapter, an efficient and effective adapter for pre-trained diffusion models to enable high-precision and high-fidelity face editing for face reenactment and swapping tasks. FaceAdapter is designed to decouple the control of identity, target structure, and attributes, allowing both tasks to be performed with a single model. The adapter consists of three main components: a Spatial Condition Generator (SCG) that provides precise landmarks and background, an Identity Encoder that transfers face embeddings to the text space, and an Attribute Controller that integrates spatial conditions and detailed attributes. FaceAdapter achieves comparable or superior performance in terms of motion control precision, ID retention capability, and generation quality compared to fully fine-tuned models. It seamlessly integrates with various StableDiffusion models and is trained efficiently by freezing the parameters of the denoising U-Net, which helps preserve prior knowledge and prevent overfitting. The method is also lightweight and plug-and-play, enabling it to handle both face reenactment and swapping tasks simultaneously. The experiments show that FaceAdapter outperforms state-of-the-art methods in terms of image quality, motion control accuracy, and identity preservation. The method is also effective in handling large facial shape changes and poses, and it can be used for forgery detection to enhance the ability to identify and combat deepfakes. However, the unified model has limitations in achieving temporal stability in video face reenactment/swapping, which requires additional temporal fine-tuning in the future. The potential misuse of FaceAdapter could lead to privacy invasion, misinformation spread, and ethical concerns, which can be mitigated by incorporating visible and invisible digital watermarks.This paper introduces FaceAdapter, an efficient and effective adapter for pre-trained diffusion models to enable high-precision and high-fidelity face editing for face reenactment and swapping tasks. FaceAdapter is designed to decouple the control of identity, target structure, and attributes, allowing both tasks to be performed with a single model. The adapter consists of three main components: a Spatial Condition Generator (SCG) that provides precise landmarks and background, an Identity Encoder that transfers face embeddings to the text space, and an Attribute Controller that integrates spatial conditions and detailed attributes. FaceAdapter achieves comparable or superior performance in terms of motion control precision, ID retention capability, and generation quality compared to fully fine-tuned models. It seamlessly integrates with various StableDiffusion models and is trained efficiently by freezing the parameters of the denoising U-Net, which helps preserve prior knowledge and prevent overfitting. The method is also lightweight and plug-and-play, enabling it to handle both face reenactment and swapping tasks simultaneously. The experiments show that FaceAdapter outperforms state-of-the-art methods in terms of image quality, motion control accuracy, and identity preservation. The method is also effective in handling large facial shape changes and poses, and it can be used for forgery detection to enhance the ability to identify and combat deepfakes. However, the unified model has limitations in achieving temporal stability in video face reenactment/swapping, which requires additional temporal fine-tuning in the future. The potential misuse of FaceAdapter could lead to privacy invasion, misinformation spread, and ethical concerns, which can be mitigated by incorporating visible and invisible digital watermarks.
Reach us at info@study.space