[slides and audio] Understanding deep image representations by inverting them

This paper explores the visual information retained by image representations, including SIFT, HOG, and Convolutional Neural Networks (CNNs), by inverting these representations to reconstruct the original images. The authors propose a general framework for inverting representations, which uses only the representation itself and a natural image prior to reconstruct the image from a given code. This method outperforms recent alternatives in reconstructing HOG and SIFT representations. The technique is then applied to analyze the invariances captured by recent state-of-the-art CNNs, revealing that several layers in CNNs retain photographically accurate information with varying degrees of geometric and photometric invariance. The paper also examines the locality of information stored in the representations by reconstructing images from selected groups of neurons. The results show that the CNNs gradually build an increasing amount of invariance as they process deeper layers, and that the codes capture progressively larger deformations of the object. The paper concludes by discussing the potential for further exploration of more expressive natural image priors and the impact of network hyperparameters on reconstructions.This paper explores the visual information retained by image representations, including SIFT, HOG, and Convolutional Neural Networks (CNNs), by inverting these representations to reconstruct the original images. The authors propose a general framework for inverting representations, which uses only the representation itself and a natural image prior to reconstruct the image from a given code. This method outperforms recent alternatives in reconstructing HOG and SIFT representations. The technique is then applied to analyze the invariances captured by recent state-of-the-art CNNs, revealing that several layers in CNNs retain photographically accurate information with varying degrees of geometric and photometric invariance. The paper also examines the locality of information stored in the representations by reconstructing images from selected groups of neurons. The results show that the CNNs gradually build an increasing amount of invariance as they process deeper layers, and that the codes capture progressively larger deformations of the object. The paper concludes by discussing the potential for further exploration of more expressive natural image priors and the impact of network hyperparameters on reconstructions.

Understanding Deep Image Representations by Inverting Them

26 Nov 2014 | Aravindh Mahendran, Andrea Vedaldi