[slides] MVSNet%3A Depth Inference for Unstructured Multi-view Stereo

MVSNet is an end-to-end deep learning architecture designed for depth map inference from multi-view images. The network first extracts deep visual features from the input images and then builds a 3D cost volume using differentiable homography warping. This cost volume is regularized and refined to generate the final depth map, which is further improved using the reference image. The framework is flexible and can handle arbitrary numbers of input views through a variance-based cost metric. MVSNet is evaluated on the large-scale indoor DTU dataset and the complex outdoor Tanks and Temples dataset. It outperforms previous state-of-the-art methods in terms of completeness and overall quality, and is significantly faster in runtime. The method demonstrates strong generalization ability, ranking first on the Tanks and Temples benchmark without fine-tuning.MVSNet is an end-to-end deep learning architecture designed for depth map inference from multi-view images. The network first extracts deep visual features from the input images and then builds a 3D cost volume using differentiable homography warping. This cost volume is regularized and refined to generate the final depth map, which is further improved using the reference image. The framework is flexible and can handle arbitrary numbers of input views through a variance-based cost metric. MVSNet is evaluated on the large-scale indoor DTU dataset and the complex outdoor Tanks and Temples dataset. It outperforms previous state-of-the-art methods in terms of completeness and overall quality, and is significantly faster in runtime. The method demonstrates strong generalization ability, ranking first on the Tanks and Temples benchmark without fine-tuning.

MVSNet: Depth Inference for Unstructured Multi-view Stereo

17 Jul 2018 | Yao Yao1, Zixin Luo1, Shiwei Li1, Tian Fang2, and Long Quan1