Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches

Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches

17 (2016) 1-32 | Jure Žbontar, Yann LeCun
This paper presents a method for extracting depth information from a rectified image pair using a convolutional neural network (CNN) to compute the matching cost in stereo matching. The approach focuses on the first stage of stereo algorithms: matching cost computation. The method trains a CNN to learn a similarity measure on small image patches using a supervised approach, where a binary classification dataset is constructed with examples of similar and dissimilar patches. Two network architectures are evaluated: one for speed and one for accuracy. The CNN output is used to initialize the stereo matching cost, followed by post-processing steps such as cross-based cost aggregation, semiglobal matching, left-right consistency check, subpixel enhancement, median filter, and bilateral filter. The method is evaluated on the KITTI 2012, KITTI 2015, and Middlebury stereo datasets, outperforming other approaches on all three. The contributions include two CNN architectures for computing the stereo matching cost, a method with the lowest error rate on the three datasets, and experiments analyzing the importance of dataset size, error rate compared to other methods, and the trade-off between accuracy and runtime. The paper extends previous work by including a new architecture, results on two new datasets, lower error rates, and more thorough experiments. The method uses ground truth disparity maps from the KITTI and Middlebury datasets to construct a binary classification dataset. The CNN is trained to predict the similarity between image patches, with the output used to initialize the matching cost. Post-processing steps refine the disparity map, and the method is evaluated on three datasets, showing superior performance. The paper also discusses related work, including previous methods for stereo matching, and compares the proposed method with other approaches. The stereo method includes post-processing steps such as cross-based cost aggregation, semiglobal matching, interpolation, subpixel enhancement, median filter, and bilateral filter. The method is evaluated on three datasets, showing superior performance. The paper also discusses data augmentation techniques and runtime performance, showing that the fast architecture is significantly faster than the accurate one. The method is compared with other matching cost computation methods, including sum of absolute differences, census transform, and normalized cross-correlation, showing that the CNN-based approach performs best. The stereo method is evaluated on three datasets, showing that the accurate architecture performs best, followed by the fast architecture and the census transform. The paper concludes that the CNN-based approach is effective for stereo matching, with the accurate architecture outperforming other methods.This paper presents a method for extracting depth information from a rectified image pair using a convolutional neural network (CNN) to compute the matching cost in stereo matching. The approach focuses on the first stage of stereo algorithms: matching cost computation. The method trains a CNN to learn a similarity measure on small image patches using a supervised approach, where a binary classification dataset is constructed with examples of similar and dissimilar patches. Two network architectures are evaluated: one for speed and one for accuracy. The CNN output is used to initialize the stereo matching cost, followed by post-processing steps such as cross-based cost aggregation, semiglobal matching, left-right consistency check, subpixel enhancement, median filter, and bilateral filter. The method is evaluated on the KITTI 2012, KITTI 2015, and Middlebury stereo datasets, outperforming other approaches on all three. The contributions include two CNN architectures for computing the stereo matching cost, a method with the lowest error rate on the three datasets, and experiments analyzing the importance of dataset size, error rate compared to other methods, and the trade-off between accuracy and runtime. The paper extends previous work by including a new architecture, results on two new datasets, lower error rates, and more thorough experiments. The method uses ground truth disparity maps from the KITTI and Middlebury datasets to construct a binary classification dataset. The CNN is trained to predict the similarity between image patches, with the output used to initialize the matching cost. Post-processing steps refine the disparity map, and the method is evaluated on three datasets, showing superior performance. The paper also discusses related work, including previous methods for stereo matching, and compares the proposed method with other approaches. The stereo method includes post-processing steps such as cross-based cost aggregation, semiglobal matching, interpolation, subpixel enhancement, median filter, and bilateral filter. The method is evaluated on three datasets, showing superior performance. The paper also discusses data augmentation techniques and runtime performance, showing that the fast architecture is significantly faster than the accurate one. The method is compared with other matching cost computation methods, including sum of absolute differences, census transform, and normalized cross-correlation, showing that the CNN-based approach performs best. The stereo method is evaluated on three datasets, showing that the accurate architecture performs best, followed by the fast architecture and the census transform. The paper concludes that the CNN-based approach is effective for stereo matching, with the accurate architecture outperforming other methods.
Reach us at info@study.space
[slides and audio] Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches