Understanding StackGAN%2B%2B%3A Realistic Image Synthesis with Stacked Generative Adversarial Networks

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks This paper introduces StackGANs, a novel approach to generating high-resolution, photo-realistic images using stacked generative adversarial networks (GANs). The authors propose two main models: StackGAN-v1 and StackGAN-v2. StackGAN-v1 is a two-stage GAN that first generates low-resolution images based on text descriptions and then refines them to produce high-resolution images. This process involves a Stage-I GAN that sketches the basic shape and colors, followed by a Stage-II GAN that corrects defects and adds details. The paper also introduces Conditioning Augmentation (CA), a technique that enhances the stability and diversity of the training process by introducing random perturbations in the latent conditioning manifold. StackGAN-v2 is an advanced multi-stage GAN architecture designed for both conditional and unconditional image generation tasks. It consists of multiple generators and discriminators arranged in a tree-like structure, where images at different scales are generated from different branches. This architecture allows for more stable training by jointly approximating multiple distributions at different scales. The paper also introduces a color-consistency regularization term to improve the quality of generated images by ensuring consistency across different scales. Experiments demonstrate that StackGANs significantly outperform other state-of-the-art methods in generating photo-realistic images, achieving higher inception scores, lower Fréchet Inception Distance (FID), and better human rankings. The proposed methods are evaluated on various datasets, including CUB, Oxford-102, COCO, LSUN bedroom, LSUN church, and ImageNet cat, showing their effectiveness in text-to-image synthesis and unconditional image synthesis tasks.StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks This paper introduces StackGANs, a novel approach to generating high-resolution, photo-realistic images using stacked generative adversarial networks (GANs). The authors propose two main models: StackGAN-v1 and StackGAN-v2. StackGAN-v1 is a two-stage GAN that first generates low-resolution images based on text descriptions and then refines them to produce high-resolution images. This process involves a Stage-I GAN that sketches the basic shape and colors, followed by a Stage-II GAN that corrects defects and adds details. The paper also introduces Conditioning Augmentation (CA), a technique that enhances the stability and diversity of the training process by introducing random perturbations in the latent conditioning manifold. StackGAN-v2 is an advanced multi-stage GAN architecture designed for both conditional and unconditional image generation tasks. It consists of multiple generators and discriminators arranged in a tree-like structure, where images at different scales are generated from different branches. This architecture allows for more stable training by jointly approximating multiple distributions at different scales. The paper also introduces a color-consistency regularization term to improve the quality of generated images by ensuring consistency across different scales. Experiments demonstrate that StackGANs significantly outperform other state-of-the-art methods in generating photo-realistic images, achieving higher inception scores, lower Fréchet Inception Distance (FID), and better human rankings. The proposed methods are evaluated on various datasets, including CUB, Oxford-102, COCO, LSUN bedroom, LSUN church, and ImageNet cat, showing their effectiveness in text-to-image synthesis and unconditional image synthesis tasks.

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

28 Jun 2018 | Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Senior Member, IEEE, Xiaogang Wang, Member, IEEE, Xiaolei Huang, Member, IEEE, Dimitris N. Metaxas*, Fellow, IEEE