21 Sep 2018 | Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, Jaegul Choo
StarGAN is a novel and scalable generative adversarial network (GAN) designed for multi-domain image-to-image translation. Unlike existing methods that require separate models for each domain pair, StarGAN uses a single generator and discriminator to learn mappings between multiple domains. This allows for simultaneous training of multiple datasets with different domains within a single network, leading to superior image translation quality and the ability to translate images to any desired target domain. The model uses a label vector to represent domain information and randomly generates target domain labels during training to enable flexible translation. A mask vector is introduced to enable joint training between domains from different datasets by allowing the model to ignore unknown labels and focus on the provided ones. StarGAN is evaluated on facial attribute transfer and facial expression synthesis tasks, demonstrating its effectiveness in generating realistic images and preserving facial identity. The model is trained on multiple datasets, including CelebA and RaFD, and shows significant improvements in performance compared to baseline models. StarGAN's scalability and ability to handle multiple domains make it a promising approach for image-to-image translation tasks. The model's architecture includes a generator with convolutional and residual blocks, and a discriminator with PatchGANs for classification. The model is trained using Adam optimizer with specific learning rates and batch sizes, and achieves high-quality results on both CelebA and RaFD datasets. StarGAN's ability to handle multiple domains and datasets, along with its superior image quality and efficiency, makes it a valuable contribution to the field of image-to-image translation.StarGAN is a novel and scalable generative adversarial network (GAN) designed for multi-domain image-to-image translation. Unlike existing methods that require separate models for each domain pair, StarGAN uses a single generator and discriminator to learn mappings between multiple domains. This allows for simultaneous training of multiple datasets with different domains within a single network, leading to superior image translation quality and the ability to translate images to any desired target domain. The model uses a label vector to represent domain information and randomly generates target domain labels during training to enable flexible translation. A mask vector is introduced to enable joint training between domains from different datasets by allowing the model to ignore unknown labels and focus on the provided ones. StarGAN is evaluated on facial attribute transfer and facial expression synthesis tasks, demonstrating its effectiveness in generating realistic images and preserving facial identity. The model is trained on multiple datasets, including CelebA and RaFD, and shows significant improvements in performance compared to baseline models. StarGAN's scalability and ability to handle multiple domains make it a promising approach for image-to-image translation tasks. The model's architecture includes a generator with convolutional and residual blocks, and a discriminator with PatchGANs for classification. The model is trained using Adam optimizer with specific learning rates and batch sizes, and achieves high-quality results on both CelebA and RaFD datasets. StarGAN's ability to handle multiple domains and datasets, along with its superior image quality and efficiency, makes it a valuable contribution to the field of image-to-image translation.