[slides] GenS%3A Generalizable Neural Surface Reconstruction from Multi-View Images

GenS is an end-to-end generalizable neural surface reconstruction model that addresses the limitations of existing methods in multi-view surface reconstruction. Unlike coordinate-based methods that require separate networks for each scene, GenS constructs a generalized multi-scale volume to directly encode all scenes. This approach enables the model to recover high-frequency details while maintaining global smoothness. The model introduces a multi-scale feature-metric consistency to enforce multi-view consistency in a more discriminative multi-scale feature space, which is robust to photometric consistency failures. Additionally, a view contrast loss is proposed to improve geometric smoothness and accuracy when visible viewpoints are limited. The model is trained end-to-end and achieves state-of-the-art results on popular benchmarks such as DTU and BlendedMVS. Experiments show that GenS outperforms existing methods, including those with ground-truth depth supervision, and can generalize well to new scenes. The model's key contributions include a powerful multi-scale volume representation, a discriminative multi-scale feature-metric consistency, and a view contrast loss that enhances geometric accuracy. The model is efficient and effective in both generic and per-scene optimization settings.GenS is an end-to-end generalizable neural surface reconstruction model that addresses the limitations of existing methods in multi-view surface reconstruction. Unlike coordinate-based methods that require separate networks for each scene, GenS constructs a generalized multi-scale volume to directly encode all scenes. This approach enables the model to recover high-frequency details while maintaining global smoothness. The model introduces a multi-scale feature-metric consistency to enforce multi-view consistency in a more discriminative multi-scale feature space, which is robust to photometric consistency failures. Additionally, a view contrast loss is proposed to improve geometric smoothness and accuracy when visible viewpoints are limited. The model is trained end-to-end and achieves state-of-the-art results on popular benchmarks such as DTU and BlendedMVS. Experiments show that GenS outperforms existing methods, including those with ground-truth depth supervision, and can generalize well to new scenes. The model's key contributions include a powerful multi-scale volume representation, a discriminative multi-scale feature-metric consistency, and a view contrast loss that enhances geometric accuracy. The model is efficient and effective in both generic and per-scene optimization settings.

GenS: Generalizable Neural Surface Reconstruction from Multi-View Images

4 Jun 2024 | Rui Peng¹.² Xiaodong Gu³ Luyang Tang¹ Shihe Shen¹ Fanqi Yu¹ Ronggang Wang¹.²