Understanding OmniGlue%3A Generalizable Feature Matching with Foundation Model Guidance

OmniGlue is a novel learnable image matcher designed to enhance generalization to novel image domains. It leverages broad visual knowledge from a foundation model, such as DINOv2, to guide the feature matching process, improving performance on domains not seen during training. Additionally, OmniGlue introduces a keypoint position-guided attention mechanism to disentangle spatial and appearance information, enhancing matching descriptors. Comprehensive experiments on seven datasets, including scene-level, object-centric, and aerial images, demonstrate that OmniGlue achieves significant relative gains of 20.9% on unseen domains compared to a reference model and outperforms the recent LightGlue method by 9.5%. The code and model are available at <https://hwjiang1510.github.io/OmniGlue>.OmniGlue is a novel learnable image matcher designed to enhance generalization to novel image domains. It leverages broad visual knowledge from a foundation model, such as DINOv2, to guide the feature matching process, improving performance on domains not seen during training. Additionally, OmniGlue introduces a keypoint position-guided attention mechanism to disentangle spatial and appearance information, enhancing matching descriptors. Comprehensive experiments on seven datasets, including scene-level, object-centric, and aerial images, demonstrate that OmniGlue achieves significant relative gains of 20.9% on unseen domains compared to a reference model and outperforms the recent LightGlue method by 9.5%. The code and model are available at <https://hwjiang1510.github.io/OmniGlue>.

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

21 May 2024 | Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, André Araujo