Understanding Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches

The paper presents a method for extracting depth information from rectified image pairs using a convolutional neural network (CNN) to compute the matching cost. The approach focuses on the first stage of stereo algorithms, specifically the computation of matching costs. The CNN is trained to learn a similarity measure on small image patches, with training conducted in a supervised manner using a binary classification dataset of similar and dissimilar patch pairs. Two network architectures are examined: one optimized for speed and another for accuracy. The output of the CNN initializes the matching cost, which is then refined through post-processing steps including cross-based cost aggregation, semiglobal matching, left-right consistency check, subpixel enhancement, median filtering, and bilateral filtering. The method is evaluated on the KITTI 2012, KITTI 2015, and Middlebury stereo datasets, demonstrating superior performance compared to other approaches. The contributions of the paper include two CNN architectures for stereo matching cost computation, a method with the lowest error rate on the mentioned datasets, and experiments analyzing dataset size, error rate, and the trade-off between accuracy and runtime.The paper presents a method for extracting depth information from rectified image pairs using a convolutional neural network (CNN) to compute the matching cost. The approach focuses on the first stage of stereo algorithms, specifically the computation of matching costs. The CNN is trained to learn a similarity measure on small image patches, with training conducted in a supervised manner using a binary classification dataset of similar and dissimilar patch pairs. Two network architectures are examined: one optimized for speed and another for accuracy. The output of the CNN initializes the matching cost, which is then refined through post-processing steps including cross-based cost aggregation, semiglobal matching, left-right consistency check, subpixel enhancement, median filtering, and bilateral filtering. The method is evaluated on the KITTI 2012, KITTI 2015, and Middlebury stereo datasets, demonstrating superior performance compared to other approaches. The contributions of the paper include two CNN architectures for stereo matching cost computation, a method with the lowest error rate on the mentioned datasets, and experiments analyzing dataset size, error rate, and the trade-off between accuracy and runtime.

Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches

17 (2016) 1-32 | Jure Žbontar, Yann LeCun