Self-Attention Generative Adversarial Networks

Self-Attention Generative Adversarial Networks

14 Jun 2019 | Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena
This paper introduces the Self-Attention Generative Adversarial Network (SAGAN), which incorporates a self-attention mechanism into the GAN framework to model long-range dependencies in image generation tasks. Traditional convolutional GANs generate high-resolution details based on spatially local points in lower-resolution feature maps, while SAGAN allows details to be generated using cues from all feature locations. The discriminator can check that highly detailed features in distant portions of the image are consistent with each other. The paper also applies spectral normalization to the generator, improving training dynamics. Extensive experiments on the ImageNet dataset show that SAGAN significantly outperforms prior work, boosting the best published Inception score from 36.8 to 52.52 and reducing the Fréchet Inception distance from 27.62 to 18.65. Visualization of the attention layers reveals that the generator leverages neighborhoods corresponding to object shapes rather than fixed local regions. The paper also investigates techniques to stabilize GAN training, including spectral normalization and the two-timescale update rule (TTUR), which are shown to be effective in improving training stability and convergence speed.This paper introduces the Self-Attention Generative Adversarial Network (SAGAN), which incorporates a self-attention mechanism into the GAN framework to model long-range dependencies in image generation tasks. Traditional convolutional GANs generate high-resolution details based on spatially local points in lower-resolution feature maps, while SAGAN allows details to be generated using cues from all feature locations. The discriminator can check that highly detailed features in distant portions of the image are consistent with each other. The paper also applies spectral normalization to the generator, improving training dynamics. Extensive experiments on the ImageNet dataset show that SAGAN significantly outperforms prior work, boosting the best published Inception score from 36.8 to 52.52 and reducing the Fréchet Inception distance from 27.62 to 18.65. Visualization of the attention layers reveals that the generator leverages neighborhoods corresponding to object shapes rather than fixed local regions. The paper also investigates techniques to stabilize GAN training, including spectral normalization and the two-timescale update rule (TTUR), which are shown to be effective in improving training stability and convergence speed.
Reach us at info@study.space