14 Jun 2019 | Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena
This paper introduces Self-Attention Generative Adversarial Networks (SAGAN), which incorporate self-attention mechanisms into GANs to model long-range dependencies in image generation. Traditional convolutional GANs rely on local receptive fields, limiting their ability to capture global dependencies. SAGANs use self-attention to enable the generator to leverage information from distant regions, improving the consistency of generated images. The self-attention module calculates responses at each position as a weighted sum of features across all positions, with weights determined by a small computational cost. This allows the generator to coordinate fine details across the entire image, while the discriminator can verify consistency between distant regions.
SAGANs also incorporate spectral normalization to stabilize training dynamics, which has been shown to improve GAN performance. The proposed SAGAN achieves state-of-the-art results on the ImageNet dataset, boosting the Inception score from 36.8 to 52.52 and reducing the Fréchet Inception distance from 27.62 to 18.65. Visualization of attention layers shows that the generator uses neighborhoods corresponding to object shapes rather than fixed local regions.
The paper also explores techniques to stabilize GAN training, including spectral normalization for both generator and discriminator, and the two-timescale update rule (TTUR) for faster convergence. Experiments show that these techniques significantly improve training stability and performance. The self-attention mechanism is shown to be more effective than residual blocks in capturing long-range dependencies, and the results demonstrate that SAGANs outperform previous GAN models in image generation tasks. The attention mechanism enables the generator and discriminator to directly model long-range dependencies in feature maps, leading to more realistic and diverse generated images. SAGANs achieve the best Inception score, intra-FID, and FID on ImageNet, demonstrating their effectiveness in class-conditional image generation.This paper introduces Self-Attention Generative Adversarial Networks (SAGAN), which incorporate self-attention mechanisms into GANs to model long-range dependencies in image generation. Traditional convolutional GANs rely on local receptive fields, limiting their ability to capture global dependencies. SAGANs use self-attention to enable the generator to leverage information from distant regions, improving the consistency of generated images. The self-attention module calculates responses at each position as a weighted sum of features across all positions, with weights determined by a small computational cost. This allows the generator to coordinate fine details across the entire image, while the discriminator can verify consistency between distant regions.
SAGANs also incorporate spectral normalization to stabilize training dynamics, which has been shown to improve GAN performance. The proposed SAGAN achieves state-of-the-art results on the ImageNet dataset, boosting the Inception score from 36.8 to 52.52 and reducing the Fréchet Inception distance from 27.62 to 18.65. Visualization of attention layers shows that the generator uses neighborhoods corresponding to object shapes rather than fixed local regions.
The paper also explores techniques to stabilize GAN training, including spectral normalization for both generator and discriminator, and the two-timescale update rule (TTUR) for faster convergence. Experiments show that these techniques significantly improve training stability and performance. The self-attention mechanism is shown to be more effective than residual blocks in capturing long-range dependencies, and the results demonstrate that SAGANs outperform previous GAN models in image generation tasks. The attention mechanism enables the generator and discriminator to directly model long-range dependencies in feature maps, leading to more realistic and diverse generated images. SAGANs achieve the best Inception score, intra-FID, and FID on ImageNet, demonstrating their effectiveness in class-conditional image generation.