SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization

SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization

2024-01-12 | Zhenlong Yuan, Jiakai Cao, Zhaoxin Li, Hao Jiang, Zhaodi Wang
SD-MVS is a novel method for multi-view stereo (MVS) that addresses challenges in 3D reconstruction of textureless areas. It integrates the Segment Anything Model (SAM) for instance segmentation to better utilize edge information for patch deformation. The method employs adaptive patch deformation with multi-scale consistency on both matching cost and propagation, enhancing the accuracy of depth estimation. Additionally, it introduces spherical gradient refinement to optimize normal and depth refinement, and uses the Expectation-Maximization (EM) algorithm for hyperparameter optimization, reducing dependency on empirical tuning. Evaluations on the ETH3D and Tanks and Temples datasets show that SD-MVS achieves state-of-the-art results with reduced computational time. The method improves the completeness of 3D models by combining spherical coordinates and gradient descent on normals with pixelwise search intervals on depths. It also enhances the robustness of depth estimation by leveraging SAM-based segmentation to separate different instances and extract subtle edge information. The method's architecture promotes multi-scale consistency in parallel, reducing runtime. The spherical gradient refinement strategy uses random orthogonal vectors and gradient descent to refine normal directions, while pixelwise depth interval search narrows perturbation intervals for more accurate depth estimation. The EM-based hyperparameter optimization alternately optimizes aggregated cost and hyperparameters, enabling automatic parameter tuning. The method outperforms existing methods in both qualitative and quantitative results, particularly in large textureless areas. It achieves high F1 scores on the ETH3D benchmark and competitive results on the TNT dataset. The method is efficient in terms of memory and runtime, striking a balance between performance and resource usage. Ablation studies validate the effectiveness of each component, showing that patch deformation and multi-scale consistency are crucial for accurate depth estimation. The method's design ensures robustness in recovering textureless areas and demonstrates practicality for large-scale outdoor reconstruction.SD-MVS is a novel method for multi-view stereo (MVS) that addresses challenges in 3D reconstruction of textureless areas. It integrates the Segment Anything Model (SAM) for instance segmentation to better utilize edge information for patch deformation. The method employs adaptive patch deformation with multi-scale consistency on both matching cost and propagation, enhancing the accuracy of depth estimation. Additionally, it introduces spherical gradient refinement to optimize normal and depth refinement, and uses the Expectation-Maximization (EM) algorithm for hyperparameter optimization, reducing dependency on empirical tuning. Evaluations on the ETH3D and Tanks and Temples datasets show that SD-MVS achieves state-of-the-art results with reduced computational time. The method improves the completeness of 3D models by combining spherical coordinates and gradient descent on normals with pixelwise search intervals on depths. It also enhances the robustness of depth estimation by leveraging SAM-based segmentation to separate different instances and extract subtle edge information. The method's architecture promotes multi-scale consistency in parallel, reducing runtime. The spherical gradient refinement strategy uses random orthogonal vectors and gradient descent to refine normal directions, while pixelwise depth interval search narrows perturbation intervals for more accurate depth estimation. The EM-based hyperparameter optimization alternately optimizes aggregated cost and hyperparameters, enabling automatic parameter tuning. The method outperforms existing methods in both qualitative and quantitative results, particularly in large textureless areas. It achieves high F1 scores on the ETH3D benchmark and competitive results on the TNT dataset. The method is efficient in terms of memory and runtime, striking a balance between performance and resource usage. Ablation studies validate the effectiveness of each component, showing that patch deformation and multi-scale consistency are crucial for accurate depth estimation. The method's design ensures robustness in recovering textureless areas and demonstrates practicality for large-scale outdoor reconstruction.
Reach us at info@study.space