Generative Image Inpainting with Contextual Attention

Generative Image Inpainting with Contextual Attention

21 Mar 2018 | Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang
This paper proposes a novel deep generative model for image inpainting with a contextual attention layer. The model is a feedforward, fully convolutional neural network that can process images with multiple holes at arbitrary locations and with variable sizes. The model consists of two stages: a simple dilated convolutional network trained with reconstruction loss to rough out the missing contents, and a contextual attention layer that uses the features of known patches as convolutional filters to process the generated patches. The contextual attention layer is designed to match generated patches with known contextual patches, weigh relevant patches with channel-wise softmax, and reconstruct the generated patches with contextual patches. The model also includes a spatial propagation layer to encourage spatial coherency of attention. The model is trained end to end with reconstruction losses and two Wasserstein GAN losses. Experiments on multiple datasets including faces (CelebA, CelebA-HQ), textures (DTD) and natural images (ImageNet, Places2) demonstrate that the proposed approach generates higher-quality inpainting results than existing ones. The model achieves high-quality inpainting results on a variety of challenging datasets including CelebA faces, CelebAHQ faces, DTD textures, ImageNet and Places2. The model is efficient and can be trained in a week instead of two months. The model is also effective for image editing and computational photography tasks including image-based rendering, image super-resolution, guided editing and many others.This paper proposes a novel deep generative model for image inpainting with a contextual attention layer. The model is a feedforward, fully convolutional neural network that can process images with multiple holes at arbitrary locations and with variable sizes. The model consists of two stages: a simple dilated convolutional network trained with reconstruction loss to rough out the missing contents, and a contextual attention layer that uses the features of known patches as convolutional filters to process the generated patches. The contextual attention layer is designed to match generated patches with known contextual patches, weigh relevant patches with channel-wise softmax, and reconstruct the generated patches with contextual patches. The model also includes a spatial propagation layer to encourage spatial coherency of attention. The model is trained end to end with reconstruction losses and two Wasserstein GAN losses. Experiments on multiple datasets including faces (CelebA, CelebA-HQ), textures (DTD) and natural images (ImageNet, Places2) demonstrate that the proposed approach generates higher-quality inpainting results than existing ones. The model achieves high-quality inpainting results on a variety of challenging datasets including CelebA faces, CelebAHQ faces, DTD textures, ImageNet and Places2. The model is efficient and can be trained in a week instead of two months. The model is also effective for image editing and computational photography tasks including image-based rendering, image super-resolution, guided editing and many others.
Reach us at info@study.space