Semantic Image Inpainting with Deep Generative Models

Semantic Image Inpainting with Deep Generative Models

13 Jul 2017 | Raymond A. Yeh*, Chen Chen*, Teck Yian Lim, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do
This paper proposes a novel method for semantic image inpainting using deep generative models. Semantic image inpainting involves filling large missing regions in images based on the available visual data and image semantics. Existing methods often fail to produce satisfactory results due to the lack of high-level context. The proposed method generates missing content by conditioning on the available data. Given a trained generative model, the method searches for the closest encoding of the corrupted image in the latent image manifold using context and prior losses. This encoding is then passed through the generative model to infer the missing content. The method can be applied to arbitrarily structured missing regions without requiring specific information about the holes during training. Experiments on three datasets show that the method successfully predicts information in large missing regions and achieves pixel-level photorealism, significantly outperforming state-of-the-art methods. The method is based on generative models and deep neural networks. It utilizes the generator G and the discriminator D, both trained with uncorrupted data. After training, the generator G can take a point z drawn from the latent space and generate an image mimicking samples from the data distribution. The method aims to recover the encoding z "closest" to the corrupted image while being constrained to the latent manifold. The process of finding z is formulated as an optimization problem with a context loss and a prior loss. The context loss captures information from the available data, while the prior loss penalizes unrealistic images. The method uses a weighted ℓ1-norm difference between the recovered image and the uncorrupted portion to define the context loss. The prior loss is defined as the GAN loss for training the discriminator D. The method is evaluated on three datasets: CelebA, SVHN, and Stanford Cars. The results show that the method produces more realistic images than the state-of-the-art method, Context Encoder (CE). The method is able to handle arbitrary missing regions without retraining the network, which is a significant advantage for inpainting applications. The method also achieves higher PSNR values in some cases, although the quantitative results do not always reflect the real performance of different methods when the ground-truth is not unique. The method is able to handle random holes and produces visually more appealing results than the CE. The method is also able to handle complex scenes, although its prediction performance strongly relies on the generative model and the training procedure. The method is able to generate realistic images for large missing regions and achieves pixel-level photorealism.This paper proposes a novel method for semantic image inpainting using deep generative models. Semantic image inpainting involves filling large missing regions in images based on the available visual data and image semantics. Existing methods often fail to produce satisfactory results due to the lack of high-level context. The proposed method generates missing content by conditioning on the available data. Given a trained generative model, the method searches for the closest encoding of the corrupted image in the latent image manifold using context and prior losses. This encoding is then passed through the generative model to infer the missing content. The method can be applied to arbitrarily structured missing regions without requiring specific information about the holes during training. Experiments on three datasets show that the method successfully predicts information in large missing regions and achieves pixel-level photorealism, significantly outperforming state-of-the-art methods. The method is based on generative models and deep neural networks. It utilizes the generator G and the discriminator D, both trained with uncorrupted data. After training, the generator G can take a point z drawn from the latent space and generate an image mimicking samples from the data distribution. The method aims to recover the encoding z "closest" to the corrupted image while being constrained to the latent manifold. The process of finding z is formulated as an optimization problem with a context loss and a prior loss. The context loss captures information from the available data, while the prior loss penalizes unrealistic images. The method uses a weighted ℓ1-norm difference between the recovered image and the uncorrupted portion to define the context loss. The prior loss is defined as the GAN loss for training the discriminator D. The method is evaluated on three datasets: CelebA, SVHN, and Stanford Cars. The results show that the method produces more realistic images than the state-of-the-art method, Context Encoder (CE). The method is able to handle arbitrary missing regions without retraining the network, which is a significant advantage for inpainting applications. The method also achieves higher PSNR values in some cases, although the quantitative results do not always reflect the real performance of different methods when the ground-truth is not unique. The method is able to handle random holes and produces visually more appealing results than the CE. The method is also able to handle complex scenes, although its prediction performance strongly relies on the generative model and the training procedure. The method is able to generate realistic images for large missing regions and achieves pixel-level photorealism.
Reach us at info@study.space