13 Mar 2017 | Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan Kennedy, Abraham Bachrach, Adam Bry
The paper introduces a novel deep learning architecture, GC-Net (Geometry and Context Network), for estimating disparity from stereo images. The key contributions include:
1. **End-to-End Learning**: The method is trained end-to-end without additional post-processing or regularization, achieving sub-pixel accuracy.
2. **Cost Volume Formation**: It leverages deep feature representations to form a cost volume, incorporating geometric knowledge.
3. **Contextual Information**: 3-D convolutions are used to learn and incorporate contextual information over the cost volume.
4. **Differentiable Soft Argmin**: A differentiable soft argmin operation is proposed to regress disparity values from the cost volume, allowing for accurate and smooth disparity estimation.
The method is evaluated on the Scene Flow and KITTI datasets, setting a new state-of-the-art benchmark on KITTI while being significantly faster than competing approaches. The paper also demonstrates that the model can learn semantic reasoning and handle challenging scenarios such as reflective surfaces and thin structures.The paper introduces a novel deep learning architecture, GC-Net (Geometry and Context Network), for estimating disparity from stereo images. The key contributions include:
1. **End-to-End Learning**: The method is trained end-to-end without additional post-processing or regularization, achieving sub-pixel accuracy.
2. **Cost Volume Formation**: It leverages deep feature representations to form a cost volume, incorporating geometric knowledge.
3. **Contextual Information**: 3-D convolutions are used to learn and incorporate contextual information over the cost volume.
4. **Differentiable Soft Argmin**: A differentiable soft argmin operation is proposed to regress disparity values from the cost volume, allowing for accurate and smooth disparity estimation.
The method is evaluated on the Scene Flow and KITTI datasets, setting a new state-of-the-art benchmark on KITTI while being significantly faster than competing approaches. The paper also demonstrates that the model can learn semantic reasoning and handle challenging scenarios such as reflective surfaces and thin structures.