[slides and audio] Stable-Makeup%3A When Real-World Makeup Transfer Meets Diffusion Model

**Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model** **Authors:** Yuxuan Zhang, Lifu Wei, Qing Zhang, Yiren Song, Jiaming Liu, Huaxia Li, Xu Tang, Yao Hu, and Haibo Zhao **Institution:** Shanghai Jiao Tong University, Xiaohongshu Inc., Peking University, Shenyang Institute of Automation Chinese Academy of Sciences, National University of Singapore **Abstract:** Current makeup transfer methods are limited to simple styles, making them unsuitable for real-world applications. This paper introduces Stable-Makeup, a novel diffusion-based makeup transfer method capable of robustly transferring a wide range of real-world makeup styles onto user-provided faces. Stable-Makeup is based on a pre-trained diffusion model and utilizes a Detail-Preserving (D-P) makeup encoder to encode makeup details. It also employs content and structural control modules to preserve the content and structural information of the source image. By incorporating makeup cross-attention layers in U-Net, Stable-Makeup can accurately transfer detailed makeup to the corresponding position in the source image. After content-structure decoupling training, Stable-Makeup maintains the content and facial structure of the source image. Extensive experiments demonstrate that Stable-Makeup outperforms existing methods in terms of state-of-the-art performance, robustness, and generalizability, making it applicable to various tasks such as cross-domain makeup transfer and makeup-guided text-to-image generation. **Keywords:** Makeup Transfer, D-P Makeup Encoder, Diffusion **Introduction:** Makeup transfer is a significant computer vision task with applications in beauty and virtual try-on systems. However, existing methods often fall short when dealing with diverse and intricate makeup styles. This paper addresses this gap by introducing Stable-Makeup, a diffusion-based approach that leverages a pre-trained diffusion model and a Detail-Preserving Makeup Encoder to capture and transfer detailed makeup information while preserving the content and structure of the source image. **Methodology:** Stable-Makeup consists of three key components: the Detail-Preserving Makeup Encoder, Makeup Cross-attention Layers, and Content and Structural Control Modules. The D-P makeup encoder extracts multi-scale and spatial-aware features of the reference makeup, while the content and structural encoders encode the source image and facial structure control image, respectively. These features are then fed into the U-Net, where the makeup cross-attention layers align the detailed makeup embeddings with the intermediate feature maps of the source image. Content and structure decoupling training further ensures that the generated image maintains the content and structure of the source image. **Experiments:** The method was evaluated on the CPM-real dataset and compared with other state-of-the-art methods. Results show that Stable-Makeup outperforms existing methods in terms of makeup detail transfer, content and structure preservation, and user perception. The method also demonstrates robustness and generalizability, making it suitable**Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model** **Authors:** Yuxuan Zhang, Lifu Wei, Qing Zhang, Yiren Song, Jiaming Liu, Huaxia Li, Xu Tang, Yao Hu, and Haibo Zhao **Institution:** Shanghai Jiao Tong University, Xiaohongshu Inc., Peking University, Shenyang Institute of Automation Chinese Academy of Sciences, National University of Singapore **Abstract:** Current makeup transfer methods are limited to simple styles, making them unsuitable for real-world applications. This paper introduces Stable-Makeup, a novel diffusion-based makeup transfer method capable of robustly transferring a wide range of real-world makeup styles onto user-provided faces. Stable-Makeup is based on a pre-trained diffusion model and utilizes a Detail-Preserving (D-P) makeup encoder to encode makeup details. It also employs content and structural control modules to preserve the content and structural information of the source image. By incorporating makeup cross-attention layers in U-Net, Stable-Makeup can accurately transfer detailed makeup to the corresponding position in the source image. After content-structure decoupling training, Stable-Makeup maintains the content and facial structure of the source image. Extensive experiments demonstrate that Stable-Makeup outperforms existing methods in terms of state-of-the-art performance, robustness, and generalizability, making it applicable to various tasks such as cross-domain makeup transfer and makeup-guided text-to-image generation. **Keywords:** Makeup Transfer, D-P Makeup Encoder, Diffusion **Introduction:** Makeup transfer is a significant computer vision task with applications in beauty and virtual try-on systems. However, existing methods often fall short when dealing with diverse and intricate makeup styles. This paper addresses this gap by introducing Stable-Makeup, a diffusion-based approach that leverages a pre-trained diffusion model and a Detail-Preserving Makeup Encoder to capture and transfer detailed makeup information while preserving the content and structure of the source image. **Methodology:** Stable-Makeup consists of three key components: the Detail-Preserving Makeup Encoder, Makeup Cross-attention Layers, and Content and Structural Control Modules. The D-P makeup encoder extracts multi-scale and spatial-aware features of the reference makeup, while the content and structural encoders encode the source image and facial structure control image, respectively. These features are then fed into the U-Net, where the makeup cross-attention layers align the detailed makeup embeddings with the intermediate feature maps of the source image. Content and structure decoupling training further ensures that the generated image maintains the content and structure of the source image. **Experiments:** The method was evaluated on the CPM-real dataset and compared with other state-of-the-art methods. Results show that Stable-Makeup outperforms existing methods in terms of makeup detail transfer, content and structure preservation, and user perception. The method also demonstrates robustness and generalizability, making it suitable

Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model

12 Mar 2024 | Yuxuan Zhang, Lifu Wei, Qing Zhang, Yiren Song, Jiaming Liu, Huaxia Li, Xu Tang, Yao Hu, and Haibo Zhao