14 Jun 2024 | Vincent Leroy Yohann Cabon Jerome Revaud
The paper "Grounding Image Matching in 3D with MASt3R" by Vincent Leroy, Yohann Cabon, and Jerome Revaud from NAVER LABS Europe proposes a novel approach to image matching that leverages 3D reconstruction capabilities. The authors argue that traditional 2D-based matching methods, while effective, are inherently limited by the 2D nature of pixel correspondences and can struggle with viewpoint and illumination changes. To address this, they propose MASt3R, an extension of the DUST3R framework, which predicts dense local features and uses a matching loss to improve the accuracy and robustness of correspondences.
MASt3R introduces a new head that outputs dense local feature maps, trained with an InfoNCE loss, to enhance the matching capabilities of DUST3R. Additionally, to overcome the quadratic complexity of dense matching, which makes it computationally expensive for real-world applications, the authors propose a fast reciprocal matching scheme. This scheme significantly reduces the computational cost while maintaining or improving the quality of the matches.
The paper evaluates MASt3R on several benchmarks, including the Map-free localization dataset, CO3D, RealEstate, and visual localization tasks. MASt3R outperforms state-of-the-art methods, achieving a 30% absolute improvement in VCRE AUC on the Map-free dataset. The authors also demonstrate the effectiveness of their approach in relative pose estimation, visual localization, and multi-view 3D reconstruction, showing that MASt3R can handle few-view scenarios and perform well even in zero-shot settings.
Overall, the paper highlights the importance of grounding image matching in 3D to improve its robustness and accuracy, making MASt3R a significant advancement in the field of 3D vision.The paper "Grounding Image Matching in 3D with MASt3R" by Vincent Leroy, Yohann Cabon, and Jerome Revaud from NAVER LABS Europe proposes a novel approach to image matching that leverages 3D reconstruction capabilities. The authors argue that traditional 2D-based matching methods, while effective, are inherently limited by the 2D nature of pixel correspondences and can struggle with viewpoint and illumination changes. To address this, they propose MASt3R, an extension of the DUST3R framework, which predicts dense local features and uses a matching loss to improve the accuracy and robustness of correspondences.
MASt3R introduces a new head that outputs dense local feature maps, trained with an InfoNCE loss, to enhance the matching capabilities of DUST3R. Additionally, to overcome the quadratic complexity of dense matching, which makes it computationally expensive for real-world applications, the authors propose a fast reciprocal matching scheme. This scheme significantly reduces the computational cost while maintaining or improving the quality of the matches.
The paper evaluates MASt3R on several benchmarks, including the Map-free localization dataset, CO3D, RealEstate, and visual localization tasks. MASt3R outperforms state-of-the-art methods, achieving a 30% absolute improvement in VCRE AUC on the Map-free dataset. The authors also demonstrate the effectiveness of their approach in relative pose estimation, visual localization, and multi-view 3D reconstruction, showing that MASt3R can handle few-view scenarios and perform well even in zero-shot settings.
Overall, the paper highlights the importance of grounding image matching in 3D to improve its robustness and accuracy, making MASt3R a significant advancement in the field of 3D vision.