[slides] Unsupervised CNN for Single View Depth Estimation%3A Geometry to the Rescue

The paper presents an unsupervised framework for learning a deep convolutional neural network (CNN) to predict single-view depth maps without requiring pre-training or annotated ground-truth depths. The approach is inspired by autoencoders and leverages geometric principles, specifically stereo vision. The network is trained using pairs of images with known camera motion, such as stereo pairs, where the encoder predicts the depth map for the source image, and the decoder reconstructs the source image using the predicted depth and known inter-view displacement. The reconstruction loss is based on photometric error, and a smoothness prior is applied to handle aperture issues. The method is evaluated on the KITTI dataset, showing comparable performance to state-of-the-art supervised methods with less than half the training data. The paper also discusses the advantages of the proposed approach over supervised methods and highlights its potential for in-situ and lifelong learning.The paper presents an unsupervised framework for learning a deep convolutional neural network (CNN) to predict single-view depth maps without requiring pre-training or annotated ground-truth depths. The approach is inspired by autoencoders and leverages geometric principles, specifically stereo vision. The network is trained using pairs of images with known camera motion, such as stereo pairs, where the encoder predicts the depth map for the source image, and the decoder reconstructs the source image using the predicted depth and known inter-view displacement. The reconstruction loss is based on photometric error, and a smoothness prior is applied to handle aperture issues. The method is evaluated on the KITTI dataset, showing comparable performance to state-of-the-art supervised methods with less than half the training data. The paper also discusses the advantages of the proposed approach over supervised methods and highlights its potential for in-situ and lifelong learning.

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

2016 | Ravi Garg, Vijay Kumar B.G., Gustavo Carneiro, and Ian Reid