[slides and audio] Unsupervised Visual Representation Learning by Context Prediction

This paper explores the use of spatial context as a source of supervisory signal for training a rich visual representation. The authors extract random pairs of patches from large, unlabeled image collections and train a convolutional neural network (ConvNet) to predict the position of the second patch relative to the first. They argue that performing well on this task requires the model to learn to recognize objects and their parts. The learned feature representation is shown to capture visual similarity across images, enabling unsupervised visual discovery of objects like cats, people, and birds from the Pascal VOC 2011 detection dataset. The ConvNet is also integrated into the R-CNN framework, providing a significant boost in performance over a randomly initialized ConvNet, achieving state-of-the-art results on algorithms using only Pascal-provided training set annotations. The paper discusses related work, including unsupervised representation learning and object discovery, and presents experimental results demonstrating the effectiveness of the proposed method.This paper explores the use of spatial context as a source of supervisory signal for training a rich visual representation. The authors extract random pairs of patches from large, unlabeled image collections and train a convolutional neural network (ConvNet) to predict the position of the second patch relative to the first. They argue that performing well on this task requires the model to learn to recognize objects and their parts. The learned feature representation is shown to capture visual similarity across images, enabling unsupervised visual discovery of objects like cats, people, and birds from the Pascal VOC 2011 detection dataset. The ConvNet is also integrated into the R-CNN framework, providing a significant boost in performance over a randomly initialized ConvNet, achieving state-of-the-art results on algorithms using only Pascal-provided training set annotations. The paper discusses related work, including unsupervised representation learning and object discovery, and presents experimental results demonstrating the effectiveness of the proposed method.

Unsupervised Visual Representation Learning by Context Prediction

16 Jan 2016 | Carl Doersch1,2 Abhinav Gupta1 Alexei A. Efros2