9 Jun 2014 | David Eigen, Christian Puhrsch, Rob Fergus
This paper presents a method for predicting depth from a single image using a multi-scale deep network. The approach involves two deep network stacks: one for global depth prediction and another for local refinement. The global network predicts depth based on the entire image, while the local network refines this prediction using local information. A scale-invariant error is used to measure depth relations, which helps in handling the inherent ambiguity of depth estimation from a single image. The method achieves state-of-the-art results on the NYU Depth and KITTI datasets, and matches detailed depth boundaries without requiring superpixelation.
Depth estimation is crucial for understanding 3D geometry in a scene. While stereo images allow for local correspondence, depth estimation from a single image is more challenging due to the need for integrating global and local cues. The task is inherently ambiguous, with significant uncertainty arising from the overall scale. The proposed method addresses this by using a scale-invariant error to focus on spatial relations rather than general scale, which is particularly useful for applications like 3D modeling.
The method uses a neural network with two components: one that estimates the global structure of the scene and another that refines it using local information. The network is trained with a loss function that explicitly accounts for depth relations between pixel locations, in addition to pointwise error. The system achieves state-of-the-art estimation rates on NYU Depth and KITTI, as well as improved qualitative outputs.
The paper also discusses related work, including methods for depth estimation from a single image and stereo depth estimation. It presents an approach for depth estimation using a multi-scale deep network, which includes a global coarse-scale network and a local fine-scale network. The global network predicts depth at a global level, while the local network refines this prediction using local information. The method uses a scale-invariant error to measure depth relations, which helps in handling the inherent ambiguity of depth estimation from a single image.
The paper evaluates the method on the NYU Depth and KITTI datasets, showing that it achieves state-of-the-art results. The method is compared against other approaches, including Make3D, and shows improvements in both scale-dependent and scale-invariant metrics. The results demonstrate that the method can predict better relations and means, showing that it is effective in handling the challenges of depth estimation from a single image.This paper presents a method for predicting depth from a single image using a multi-scale deep network. The approach involves two deep network stacks: one for global depth prediction and another for local refinement. The global network predicts depth based on the entire image, while the local network refines this prediction using local information. A scale-invariant error is used to measure depth relations, which helps in handling the inherent ambiguity of depth estimation from a single image. The method achieves state-of-the-art results on the NYU Depth and KITTI datasets, and matches detailed depth boundaries without requiring superpixelation.
Depth estimation is crucial for understanding 3D geometry in a scene. While stereo images allow for local correspondence, depth estimation from a single image is more challenging due to the need for integrating global and local cues. The task is inherently ambiguous, with significant uncertainty arising from the overall scale. The proposed method addresses this by using a scale-invariant error to focus on spatial relations rather than general scale, which is particularly useful for applications like 3D modeling.
The method uses a neural network with two components: one that estimates the global structure of the scene and another that refines it using local information. The network is trained with a loss function that explicitly accounts for depth relations between pixel locations, in addition to pointwise error. The system achieves state-of-the-art estimation rates on NYU Depth and KITTI, as well as improved qualitative outputs.
The paper also discusses related work, including methods for depth estimation from a single image and stereo depth estimation. It presents an approach for depth estimation using a multi-scale deep network, which includes a global coarse-scale network and a local fine-scale network. The global network predicts depth at a global level, while the local network refines this prediction using local information. The method uses a scale-invariant error to measure depth relations, which helps in handling the inherent ambiguity of depth estimation from a single image.
The paper evaluates the method on the NYU Depth and KITTI datasets, showing that it achieves state-of-the-art results. The method is compared against other approaches, including Make3D, and shows improvements in both scale-dependent and scale-invariant metrics. The results demonstrate that the method can predict better relations and means, showing that it is effective in handling the challenges of depth estimation from a single image.