[slides] MaskGAN%3A Towards Diverse and Interactive Facial Image Manipulation

MaskGAN is a novel framework for diverse and interactive facial image manipulation. It enables users to modify semantic masks of target images based on source images to achieve desired manipulations. The framework consists of two main components: the Dense Mapping Network (DMN) and Editing Behavior Simulated Training (EBST). DMN learns style mapping between a user-modified mask and a target image, while EBST models user editing behavior on the source mask to enhance robustness. MaskGAN is evaluated on two tasks: attribute transfer and style copy, demonstrating superior performance compared to state-of-the-art methods. A large-scale high-resolution face dataset, CelebAMask-HQ, with fine-grained mask annotations is introduced to facilitate research. The dataset contains over 30,000 512x512 face images with 19 facial component categories. MaskGAN's architecture includes a Spatial-Aware Style Encoder and an Image Generation Backbone, which work together to generate manipulated faces. The framework also incorporates an Alpha Blender to maintain manipulation consistency. The training pipeline involves two stages: initial training of DMN and subsequent refinement using EBST. The model is evaluated on various metrics, including classification accuracy, perceptual quality, and identity preservation. Results show that MaskGAN outperforms other methods in attribute transfer and style copy tasks, with high-quality visual outputs and robustness to user-modified masks. The framework allows interactive editing of facial components through semantic masks, enabling users to modify shape, location, and category of facial features. The work contributes a new approach to facial image manipulation with a focus on geometry and semantic masks, and provides a large-scale dataset for further research.MaskGAN is a novel framework for diverse and interactive facial image manipulation. It enables users to modify semantic masks of target images based on source images to achieve desired manipulations. The framework consists of two main components: the Dense Mapping Network (DMN) and Editing Behavior Simulated Training (EBST). DMN learns style mapping between a user-modified mask and a target image, while EBST models user editing behavior on the source mask to enhance robustness. MaskGAN is evaluated on two tasks: attribute transfer and style copy, demonstrating superior performance compared to state-of-the-art methods. A large-scale high-resolution face dataset, CelebAMask-HQ, with fine-grained mask annotations is introduced to facilitate research. The dataset contains over 30,000 512x512 face images with 19 facial component categories. MaskGAN's architecture includes a Spatial-Aware Style Encoder and an Image Generation Backbone, which work together to generate manipulated faces. The framework also incorporates an Alpha Blender to maintain manipulation consistency. The training pipeline involves two stages: initial training of DMN and subsequent refinement using EBST. The model is evaluated on various metrics, including classification accuracy, perceptual quality, and identity preservation. Results show that MaskGAN outperforms other methods in attribute transfer and style copy tasks, with high-quality visual outputs and robustness to user-modified masks. The framework allows interactive editing of facial components through semantic masks, enabling users to modify shape, location, and category of facial features. The work contributes a new approach to facial image manipulation with a focus on geometry and semantic masks, and provides a large-scale dataset for further research.

MaskGAN: Towards Diverse and Interactive Facial Image Manipulation

1 Apr 2020 | Cheng-Han Lee1 Ziwei Liu2 Lingyun Wu1 Ping Luo3