CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

1 Apr 2024 | Feng Lu1,2, Xiangyuan Lan2*, Lijun Zhang3, Dongmei Jiang2, Yaowei Wang2, Chun Yuan1*
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition **Abstract:** Visual place recognition (VPR) aims to identify the geographical location of an input image using a geo-tagged database. Traditional methods often rely on global features, which can be robust to viewpoint and condition changes but struggle with perceptual aliasing. This paper introduces CricaVPR, a novel method that leverages cross-image correlation to enhance the robustness of VPR. CricaVPR uses an attention mechanism to correlate multiple images within a batch, allowing each image to learn from others, thus producing more invariant and discriminative features. Additionally, a multi-scale convolution-enhanced adaptation method is proposed to adapt pre-trained visual foundation models for VPR, introducing multi-scale local information to improve performance. Experimental results on benchmark datasets show that CricaVPR outperforms state-of-the-art methods with significantly less training time. **Contributions:** 1. **Cross-image Correlation-aware Representation:** CricaVPR uses attention to correlate multiple images within a batch, enhancing the robustness of each feature. 2. **Multi-scale Convolution-enhanced Adaptation:** A parameter-efficient adaptation method is designed to adapt pre-trained models for VPR, introducing multi-scale local priors. 3. **Performance:** Extensive experiments demonstrate that CricaVPR outperforms state-of-the-art methods on various VPR datasets, achieving higher Recall@1 with less training time. **Related Work:** VPR methods have traditionally used global features, but these often lack robustness to challenging environments and perceptual aliasing. CricaVPR addresses these issues by leveraging cross-image correlation and multi-scale convolution-enhanced adaptation. **Methodology:** - **Cross-image Correlation-aware Place Representation:** CricaVPR uses a cross-image encoder to correlate multiple image representations, enhancing the robustness of each feature. - **Multi-scale Convolution-enhanced Adaptation:** A parameter-efficient adaptation method is used to adapt pre-trained models for VPR, introducing multi-scale local priors. **Experiments:** - **Datasets and Performance Evaluation:** CricaVPR is evaluated on several VPR benchmark datasets, including Pitts30k, MSLS, and Tokyo24/7, showing superior performance. - **Implementation Details:** Training details and hyperparameters are provided. - **Comparison with State-of-the-Art Methods:** CricaVPR outperforms several state-of-the-art methods in terms of Recall@1. - **Ablation Study:** Ablation experiments validate the effectiveness of the proposed components. **Conclusions:** CricaVPR provides a robust global representation for VPR, addressing various challenges with cross-image correlation and multi-scale convolution-enhanced adaptation.CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition **Abstract:** Visual place recognition (VPR) aims to identify the geographical location of an input image using a geo-tagged database. Traditional methods often rely on global features, which can be robust to viewpoint and condition changes but struggle with perceptual aliasing. This paper introduces CricaVPR, a novel method that leverages cross-image correlation to enhance the robustness of VPR. CricaVPR uses an attention mechanism to correlate multiple images within a batch, allowing each image to learn from others, thus producing more invariant and discriminative features. Additionally, a multi-scale convolution-enhanced adaptation method is proposed to adapt pre-trained visual foundation models for VPR, introducing multi-scale local information to improve performance. Experimental results on benchmark datasets show that CricaVPR outperforms state-of-the-art methods with significantly less training time. **Contributions:** 1. **Cross-image Correlation-aware Representation:** CricaVPR uses attention to correlate multiple images within a batch, enhancing the robustness of each feature. 2. **Multi-scale Convolution-enhanced Adaptation:** A parameter-efficient adaptation method is designed to adapt pre-trained models for VPR, introducing multi-scale local priors. 3. **Performance:** Extensive experiments demonstrate that CricaVPR outperforms state-of-the-art methods on various VPR datasets, achieving higher Recall@1 with less training time. **Related Work:** VPR methods have traditionally used global features, but these often lack robustness to challenging environments and perceptual aliasing. CricaVPR addresses these issues by leveraging cross-image correlation and multi-scale convolution-enhanced adaptation. **Methodology:** - **Cross-image Correlation-aware Place Representation:** CricaVPR uses a cross-image encoder to correlate multiple image representations, enhancing the robustness of each feature. - **Multi-scale Convolution-enhanced Adaptation:** A parameter-efficient adaptation method is used to adapt pre-trained models for VPR, introducing multi-scale local priors. **Experiments:** - **Datasets and Performance Evaluation:** CricaVPR is evaluated on several VPR benchmark datasets, including Pitts30k, MSLS, and Tokyo24/7, showing superior performance. - **Implementation Details:** Training details and hyperparameters are provided. - **Comparison with State-of-the-Art Methods:** CricaVPR outperforms several state-of-the-art methods in terms of Recall@1. - **Ablation Study:** Ablation experiments validate the effectiveness of the proposed components. **Conclusions:** CricaVPR provides a robust global representation for VPR, addressing various challenges with cross-image correlation and multi-scale convolution-enhanced adaptation.
Reach us at info@study.space