SuperPoint: Self-Supervised Interest Point Detection and Description

SuperPoint: Self-Supervised Interest Point Detection and Description

19 Apr 2018 | Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich
This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a wide range of multiple-view geometry problems in computer vision. The proposed method, called SuperPoint, is a fully-convolutional neural network that jointly computes pixel-level interest point locations and associated descriptors in a single forward pass. Unlike patch-based neural networks, SuperPoint operates on full-sized images and does not require manual feature extraction or post-processing. The model is trained using a multi-scale, multi-homography approach called Homographic Adaptation, which boosts interest point detection repeatability and enables cross-domain adaptation (e.g., synthetic-to-real). When trained on the MS-COCO dataset using Homographic Adaptation, the model outperforms traditional corner detectors and other deep learning-based methods in detecting interest points. The final system achieves state-of-the-art homography estimation results on the HPatches dataset when compared to LIFT, SIFT, and ORB. The SuperPoint architecture consists of a shared encoder that reduces the input image dimensionality and two decoder heads that learn task-specific weights for interest point detection and description. The interest point detection head computes a probability map of "point-ness" for each pixel, while the descriptor head generates a dense map of L2-normalized fixed-length descriptors. The model is trained using a combination of interest point detection and descriptor loss functions, with the latter being optimized using a hinge loss with positive and negative margins. The model is trained on a synthetic dataset called Synthetic Shapes, which consists of simple geometric shapes with clear interest point locations. The synthetic data is used to pre-train a base detector called MagicPoint, which is then adapted to real-world images using Homographic Adaptation. The proposed method is evaluated on the HPatches dataset, where it outperforms classical detectors in terms of repeatability under illumination changes and performs comparably to SIFT under viewpoint changes. The model also achieves better homography estimation results than LIFT and ORB. The SuperPoint architecture is efficient and can run at 70 FPS on 480×640 images with a Titan X GPU. The model is also robust to imaging noise and performs well on a variety of image textures and patterns. The results show that the proposed method is effective for geometric computer vision tasks such as homography estimation, image matching, and structure from motion. The paper concludes that the proposed self-supervised framework is a promising approach for training interest point detectors and descriptors that can be applied to a wide range of computer vision tasks.This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a wide range of multiple-view geometry problems in computer vision. The proposed method, called SuperPoint, is a fully-convolutional neural network that jointly computes pixel-level interest point locations and associated descriptors in a single forward pass. Unlike patch-based neural networks, SuperPoint operates on full-sized images and does not require manual feature extraction or post-processing. The model is trained using a multi-scale, multi-homography approach called Homographic Adaptation, which boosts interest point detection repeatability and enables cross-domain adaptation (e.g., synthetic-to-real). When trained on the MS-COCO dataset using Homographic Adaptation, the model outperforms traditional corner detectors and other deep learning-based methods in detecting interest points. The final system achieves state-of-the-art homography estimation results on the HPatches dataset when compared to LIFT, SIFT, and ORB. The SuperPoint architecture consists of a shared encoder that reduces the input image dimensionality and two decoder heads that learn task-specific weights for interest point detection and description. The interest point detection head computes a probability map of "point-ness" for each pixel, while the descriptor head generates a dense map of L2-normalized fixed-length descriptors. The model is trained using a combination of interest point detection and descriptor loss functions, with the latter being optimized using a hinge loss with positive and negative margins. The model is trained on a synthetic dataset called Synthetic Shapes, which consists of simple geometric shapes with clear interest point locations. The synthetic data is used to pre-train a base detector called MagicPoint, which is then adapted to real-world images using Homographic Adaptation. The proposed method is evaluated on the HPatches dataset, where it outperforms classical detectors in terms of repeatability under illumination changes and performs comparably to SIFT under viewpoint changes. The model also achieves better homography estimation results than LIFT and ORB. The SuperPoint architecture is efficient and can run at 70 FPS on 480×640 images with a Titan X GPU. The model is also robust to imaging noise and performs well on a variety of image textures and patterns. The results show that the proposed method is effective for geometric computer vision tasks such as homography estimation, image matching, and structure from motion. The paper concludes that the proposed self-supervised framework is a promising approach for training interest point detectors and descriptors that can be applied to a wide range of computer vision tasks.
Reach us at info@study.space