LoFTR: Detector-Free Local Feature Matching with Transformers

LoFTR: Detector-Free Local Feature Matching with Transformers

1 Apr 2021 | Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, Xiaowei Zhou
LoFTR is a novel detector-free method for local image feature matching. Instead of sequentially performing feature detection, description, and matching, LoFTR first establishes pixel-wise dense matches at a coarse level and then refines them at a fine level. Unlike dense methods that use cost volumes, LoFTR uses self and cross attention layers in Transformers to generate feature descriptors conditioned on both images. The global receptive field of Transformers enables LoFTR to produce dense matches in low-texture areas where traditional detectors struggle. Experiments on indoor and outdoor datasets show that LoFTR outperforms state-of-the-art methods by a large margin and ranks first on two public benchmarks of visual localization. LoFTR uses a local feature CNN to extract multi-level features from images, then processes these features through a Transformer module with self and cross attention layers to generate position and context-dependent features. A differentiable matching layer is used to match the transformed features, resulting in a confidence matrix. Matches with high confidence are selected and refined to subpixel level. LoFTR also uses a linear Transformer to reduce computational complexity. The method is evaluated on several image matching and camera pose estimation tasks, showing that LoFTR outperforms detector-based and detector-free baselines. LoFTR achieves state-of-the-art performance and ranks first on two public benchmarks. The method is effective in low-texture and repetitive pattern regions, and is efficient in terms of computation. LoFTR is implemented with a detector-free design, avoiding the drawbacks of feature detectors. The method is evaluated on the HPatches dataset for homography estimation and on ScanNet and MegaDepth for relative pose estimation. Results show that LoFTR outperforms other methods in pose estimation and visual localization. The method is also effective in visual localization tasks, where it achieves competitive performance. LoFTR is compared to other methods in visual localization and shows robustness under day-night changes. The method is also effective in outdoor scenes with extreme viewpoint changes and repetitive patterns. The method is evaluated on the Aachen Day-Night benchmark and shows strong performance. The method is also effective in long-term visual localization tasks. The method is evaluated on the InLoc benchmark and shows strong performance. The method is also effective in visual localization tasks with long-term sequences. The method is evaluated on the VisLoc benchmark and shows strong performance. The method is also effective in visual localization tasks with varying conditions. The method is evaluated on the InLoc benchmark and shows strong performance. The method is also effective in visual localization tasks with varying conditions. The method is evaluated on the InLoc benchmark and shows strong performance. The method is also effective in visual localization tasks with varying conditions. The method is evaluated on the InLoc benchmark and shows strong performance. The method is also effective in visual localization tasks with varying conditions. The method is evaluated on the InLoc benchmark and shows strong performance. The method is also effective in visual localization tasks with varying conditions. The method isLoFTR is a novel detector-free method for local image feature matching. Instead of sequentially performing feature detection, description, and matching, LoFTR first establishes pixel-wise dense matches at a coarse level and then refines them at a fine level. Unlike dense methods that use cost volumes, LoFTR uses self and cross attention layers in Transformers to generate feature descriptors conditioned on both images. The global receptive field of Transformers enables LoFTR to produce dense matches in low-texture areas where traditional detectors struggle. Experiments on indoor and outdoor datasets show that LoFTR outperforms state-of-the-art methods by a large margin and ranks first on two public benchmarks of visual localization. LoFTR uses a local feature CNN to extract multi-level features from images, then processes these features through a Transformer module with self and cross attention layers to generate position and context-dependent features. A differentiable matching layer is used to match the transformed features, resulting in a confidence matrix. Matches with high confidence are selected and refined to subpixel level. LoFTR also uses a linear Transformer to reduce computational complexity. The method is evaluated on several image matching and camera pose estimation tasks, showing that LoFTR outperforms detector-based and detector-free baselines. LoFTR achieves state-of-the-art performance and ranks first on two public benchmarks. The method is effective in low-texture and repetitive pattern regions, and is efficient in terms of computation. LoFTR is implemented with a detector-free design, avoiding the drawbacks of feature detectors. The method is evaluated on the HPatches dataset for homography estimation and on ScanNet and MegaDepth for relative pose estimation. Results show that LoFTR outperforms other methods in pose estimation and visual localization. The method is also effective in visual localization tasks, where it achieves competitive performance. LoFTR is compared to other methods in visual localization and shows robustness under day-night changes. The method is also effective in outdoor scenes with extreme viewpoint changes and repetitive patterns. The method is evaluated on the Aachen Day-Night benchmark and shows strong performance. The method is also effective in long-term visual localization tasks. The method is evaluated on the InLoc benchmark and shows strong performance. The method is also effective in visual localization tasks with long-term sequences. The method is evaluated on the VisLoc benchmark and shows strong performance. The method is also effective in visual localization tasks with varying conditions. The method is evaluated on the InLoc benchmark and shows strong performance. The method is also effective in visual localization tasks with varying conditions. The method is evaluated on the InLoc benchmark and shows strong performance. The method is also effective in visual localization tasks with varying conditions. The method is evaluated on the InLoc benchmark and shows strong performance. The method is also effective in visual localization tasks with varying conditions. The method is evaluated on the InLoc benchmark and shows strong performance. The method is also effective in visual localization tasks with varying conditions. The method is
Reach us at info@study.space