[slides and audio] AttnGAN%3A Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

The paper introduces AttnGAN, an Attentional Generative Adversarial Network designed for fine-grained text-to-image generation. AttnGAN employs a novel attentional generative network that allows the model to synthesize detailed image regions by focusing on relevant words in the text description. Additionally, a Deep Attentional Multimodal Similarity Model (DAMSM) is proposed to compute a fine-grained image-text matching loss for training the generator. The AttnGAN significantly outperforms previous state-of-the-art methods, achieving a 14.14% boost in the inception score on the CUB dataset and a 170.25% improvement on the COCO dataset. The paper also includes a detailed analysis of the attention layers, demonstrating that the AttnGAN can automatically select relevant words at the word level for generating different parts of the image.The paper introduces AttnGAN, an Attentional Generative Adversarial Network designed for fine-grained text-to-image generation. AttnGAN employs a novel attentional generative network that allows the model to synthesize detailed image regions by focusing on relevant words in the text description. Additionally, a Deep Attentional Multimodal Similarity Model (DAMSM) is proposed to compute a fine-grained image-text matching loss for training the generator. The AttnGAN significantly outperforms previous state-of-the-art methods, achieving a 14.14% boost in the inception score on the CUB dataset and a 170.25% improvement on the COCO dataset. The paper also includes a detailed analysis of the attention layers, demonstrating that the AttnGAN can automatically select relevant words at the word level for generating different parts of the image.

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

28 Nov 2017 | Tao Xu1*, Pengchuan Zhang2, Qiuyuan Huang2, Han Zhang3, Zhe Gan4, Xiaolei Huang1, Xiaodong He2