High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth

High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth

September 2-5, 2014 | Daniel Scharstein¹, Heiko Hirschmüller², York Kitajima¹, Greg Krathwohl¹, Nera Nešić³, Xi Wang¹, and Porter Westling⁴
We present a structured lighting system for generating high-resolution stereo datasets of static indoor scenes with subpixel-accurate ground-truth disparities. The system includes novel techniques for efficient 2D subpixel correspondence search and self-calibration of cameras and projectors with modeling of lens distortion. Combining disparity estimates from multiple projector positions, we achieve a disparity accuracy of 0.2 pixels on most observed surfaces, including in half-occluded regions. We contribute 33 new 6-megapixel datasets obtained with our system and demonstrate that they present new challenges for the next generation of stereo algorithms. The system includes a portable stereo rig with two DSLR cameras and two point-and-shoot cameras, allowing capturing of scenes outside the laboratory and simulating the diversity of Internet images. Accurate floating-point disparities are achieved via robust interpolation of lighting codes and efficient 2D subpixel correspondence search. Improved calibration and rectification accuracy are achieved via bundle adjustment. Improved self-calibration of the structured light projectors, including lens distortion, is achieved via robust model selection. Additional “imperfect” versions of all datasets exhibit realistic rectification errors with accurate 2D ground-truth disparities. The system produces new stereo datasets with significantly higher quality than existing datasets. Each dataset consists of input images taken under multiple exposures and multiple ambient illuminations with and without a mirror sphere present to capture the lighting conditions. We provide each dataset with both “perfect” and realistic “imperfect” rectification, with accurate 1D and 2D floating-point disparities, respectively. We test our new datasets using three state-of-the-art stereo methods: a correlation method employing a 7×7 census transform and aggregation with overlapping 9×9 windows, the fast ELAS method by Geiger et al., and the semi-global matching (SGM) method by Hirschmüller. The results show that our new datasets provide a range of challenges that significantly exceed those of existing datasets. Imperfect rectification can yield significantly higher errors over accurate rectification, particularly in misaligned regions of high-frequency texture. Experiments demonstrate that our new datasets present a new level of challenge for stereo algorithms, both in terms of resolution and scene complexity. The challenge will be even greater when our images from different exposures, illuminations, or the point-and-shoot cameras are used. We hope that our datasets will inspire research into the next generation of stereo algorithms.We present a structured lighting system for generating high-resolution stereo datasets of static indoor scenes with subpixel-accurate ground-truth disparities. The system includes novel techniques for efficient 2D subpixel correspondence search and self-calibration of cameras and projectors with modeling of lens distortion. Combining disparity estimates from multiple projector positions, we achieve a disparity accuracy of 0.2 pixels on most observed surfaces, including in half-occluded regions. We contribute 33 new 6-megapixel datasets obtained with our system and demonstrate that they present new challenges for the next generation of stereo algorithms. The system includes a portable stereo rig with two DSLR cameras and two point-and-shoot cameras, allowing capturing of scenes outside the laboratory and simulating the diversity of Internet images. Accurate floating-point disparities are achieved via robust interpolation of lighting codes and efficient 2D subpixel correspondence search. Improved calibration and rectification accuracy are achieved via bundle adjustment. Improved self-calibration of the structured light projectors, including lens distortion, is achieved via robust model selection. Additional “imperfect” versions of all datasets exhibit realistic rectification errors with accurate 2D ground-truth disparities. The system produces new stereo datasets with significantly higher quality than existing datasets. Each dataset consists of input images taken under multiple exposures and multiple ambient illuminations with and without a mirror sphere present to capture the lighting conditions. We provide each dataset with both “perfect” and realistic “imperfect” rectification, with accurate 1D and 2D floating-point disparities, respectively. We test our new datasets using three state-of-the-art stereo methods: a correlation method employing a 7×7 census transform and aggregation with overlapping 9×9 windows, the fast ELAS method by Geiger et al., and the semi-global matching (SGM) method by Hirschmüller. The results show that our new datasets provide a range of challenges that significantly exceed those of existing datasets. Imperfect rectification can yield significantly higher errors over accurate rectification, particularly in misaligned regions of high-frequency texture. Experiments demonstrate that our new datasets present a new level of challenge for stereo algorithms, both in terms of resolution and scene complexity. The challenge will be even greater when our images from different exposures, illuminations, or the point-and-shoot cameras are used. We hope that our datasets will inspire research into the next generation of stereo algorithms.
Reach us at info@study.space