[slides and audio] Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

This paper presents a novel method for depth map prediction from a single image using a multi-scale deep network. The approach involves two deep network stacks: a coarse-scale network that predicts the global depth structure of the scene, and a fine-scale network that refines this prediction using local information. The method also incorporates a scale-invariant error metric to measure depth relations rather than scale, which helps in addressing the inherent ambiguity in depth estimation. The network is trained using raw datasets, achieving state-of-the-art results on the NYU Depth and KITTI datasets. The coarse-scale network uses convolutional and max-pooling layers to integrate global scene information, while the fine-scale network refines these predictions with local details. The scale-invariant error metric is defined to focus on spatial relationships within the scene, improving the accuracy of depth predictions. The method outperforms existing approaches, including Make3D, on both scale-dependent and scale-invariant metrics, demonstrating its effectiveness in handling the challenges of monocular depth estimation.This paper presents a novel method for depth map prediction from a single image using a multi-scale deep network. The approach involves two deep network stacks: a coarse-scale network that predicts the global depth structure of the scene, and a fine-scale network that refines this prediction using local information. The method also incorporates a scale-invariant error metric to measure depth relations rather than scale, which helps in addressing the inherent ambiguity in depth estimation. The network is trained using raw datasets, achieving state-of-the-art results on the NYU Depth and KITTI datasets. The coarse-scale network uses convolutional and max-pooling layers to integrate global scene information, while the fine-scale network refines these predictions with local details. The scale-invariant error metric is defined to focus on spatial relationships within the scene, improving the accuracy of depth predictions. The method outperforms existing approaches, including Make3D, on both scale-dependent and scale-invariant metrics, demonstrating its effectiveness in handling the challenges of monocular depth estimation.

Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

9 Jun 2014 | David Eigen, Christian Puhrsch, Rob Fergus