Context Encoders: Feature Learning by Inpainting

Context Encoders: Feature Learning by Inpainting

21 Nov 2016 | Deepak Pathak, Philipp Krähenbühl, Jeff Donahue, Trevor Darrell, Alexei A. Efros
This paper introduces Context Encoders, a convolutional neural network trained to predict missing regions of an image based on its surrounding context. The model is trained using both a pixel-wise reconstruction loss and an adversarial loss, with the latter producing sharper results by better handling multiple modes in the output. Context Encoders learn representations that capture both appearance and semantics of visual structures, and are effective for CNN pre-training on classification, detection, and segmentation tasks. They can also be used for semantic inpainting, either stand-alone or as initialization for non-parametric methods. The model is trained on images with missing regions, where the network regresses to the missing pixel values. The encoder captures the context of an image into a compact latent feature representation, while the decoder uses that representation to produce the missing image content. The model is closely related to autoencoders, but differs in that it requires a deeper semantic understanding of the scene and the ability to synthesize high-level features over large spatial extents. The model is evaluated on various tasks, including semantic inpainting, classification, object detection, and semantic segmentation. It is shown to be competitive with state-of-the-art unsupervised/self-supervised methods on these tasks. The model is also effective for semantic inpainting, often producing realistic image content and providing a better feature for nearest neighbor-based inpainting methods. The paper also discusses related work in computer vision, including unsupervised learning, weakly-supervised and self-supervised learning, and image generation. It highlights the importance of spatial context as a source of supervisory signal and compares the proposed method with existing approaches. The paper concludes that context encoders advance the state of the art in semantic inpainting and learn feature representations that are competitive with other models trained with auxiliary supervision.This paper introduces Context Encoders, a convolutional neural network trained to predict missing regions of an image based on its surrounding context. The model is trained using both a pixel-wise reconstruction loss and an adversarial loss, with the latter producing sharper results by better handling multiple modes in the output. Context Encoders learn representations that capture both appearance and semantics of visual structures, and are effective for CNN pre-training on classification, detection, and segmentation tasks. They can also be used for semantic inpainting, either stand-alone or as initialization for non-parametric methods. The model is trained on images with missing regions, where the network regresses to the missing pixel values. The encoder captures the context of an image into a compact latent feature representation, while the decoder uses that representation to produce the missing image content. The model is closely related to autoencoders, but differs in that it requires a deeper semantic understanding of the scene and the ability to synthesize high-level features over large spatial extents. The model is evaluated on various tasks, including semantic inpainting, classification, object detection, and semantic segmentation. It is shown to be competitive with state-of-the-art unsupervised/self-supervised methods on these tasks. The model is also effective for semantic inpainting, often producing realistic image content and providing a better feature for nearest neighbor-based inpainting methods. The paper also discusses related work in computer vision, including unsupervised learning, weakly-supervised and self-supervised learning, and image generation. It highlights the importance of spatial context as a source of supervisory signal and compares the proposed method with existing approaches. The paper concludes that context encoders advance the state of the art in semantic inpainting and learn feature representations that are competitive with other models trained with auxiliary supervision.
Reach us at info@study.space