[slides] What can be seen in three dimensions with an uncalibrated stereo rig

This paper explores the possibility of obtaining rich non-metric reconstructions of the environment from an uncalibrated stereo rig, which lacks three-dimensional metric calibration data. The authors show that even with only pixel correspondences between the two retinas, it is possible to construct relatively rich non-metric reconstructions. Specifically, if five arbitrary correspondences are chosen, a unique (up to an arbitrary projective transformation) projective representation of the environment can be constructed relative to the five points in three-dimensional space that gave rise to the correspondences. If only four arbitrary correspondences are used, an affine representation of the environment can be constructed, which is defined up to an arbitrary affine transformation relative to the four points in three-dimensional space that gave rise to the correspondences. The reconstructed scene also depends on three arbitrary parameters, and two scenes reconstructed from the same set of correspondences with different parameter values are related by a projective transformation. The results suggest that computer vision may have overemphasized the importance of metric information from images, as obtaining accurate metric information is often difficult and requires complex calibration procedures. Instead, relative information, which is less sensitive to these issues, may often be sufficient for applications such as robotics. The paper includes experimental results and discusses the practical implications and future research directions.This paper explores the possibility of obtaining rich non-metric reconstructions of the environment from an uncalibrated stereo rig, which lacks three-dimensional metric calibration data. The authors show that even with only pixel correspondences between the two retinas, it is possible to construct relatively rich non-metric reconstructions. Specifically, if five arbitrary correspondences are chosen, a unique (up to an arbitrary projective transformation) projective representation of the environment can be constructed relative to the five points in three-dimensional space that gave rise to the correspondences. If only four arbitrary correspondences are used, an affine representation of the environment can be constructed, which is defined up to an arbitrary affine transformation relative to the four points in three-dimensional space that gave rise to the correspondences. The reconstructed scene also depends on three arbitrary parameters, and two scenes reconstructed from the same set of correspondences with different parameter values are related by a projective transformation. The results suggest that computer vision may have overemphasized the importance of metric information from images, as obtaining accurate metric information is often difficult and requires complex calibration procedures. Instead, relative information, which is less sensitive to these issues, may often be sufficient for applications such as robotics. The paper includes experimental results and discusses the practical implications and future research directions.

What can be seen in three dimensions with an uncalibrated stereo rig?

| Olivier D. Faugeras