CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
**Abstract:**
Visual place recognition (VPR) aims to identify the geographical location of an input image using a geo-tagged database. Traditional methods often rely on global features, which can be robust to viewpoint and condition changes but struggle with perceptual aliasing. This paper introduces CricaVPR, a novel method that leverages cross-image correlation to enhance the robustness of VPR. CricaVPR uses an attention mechanism to correlate multiple images within a batch, allowing each image to learn from others, thus producing more invariant and discriminative features. Additionally, a multi-scale convolution-enhanced adaptation method is proposed to adapt pre-trained visual foundation models for VPR, introducing multi-scale local information to improve performance. Experimental results on benchmark datasets show that CricaVPR outperforms state-of-the-art methods with significantly less training time.
**Contributions:**
1. **Cross-image Correlation-aware Representation:** CricaVPR uses attention to correlate multiple images within a batch, enhancing the robustness of each feature.
2. **Multi-scale Convolution-enhanced Adaptation:** A parameter-efficient adaptation method is designed to adapt pre-trained models for VPR, introducing multi-scale local priors.
3. **Performance:** Extensive experiments demonstrate that CricaVPR outperforms state-of-the-art methods on various VPR datasets, achieving higher Recall@1 with less training time.
**Related Work:**
VPR methods have traditionally used global features, but these often lack robustness to challenging environments and perceptual aliasing. CricaVPR addresses these issues by leveraging cross-image correlation and multi-scale convolution-enhanced adaptation.
**Methodology:**
- **Cross-image Correlation-aware Place Representation:** CricaVPR uses a cross-image encoder to correlate multiple image representations, enhancing the robustness of each feature.
- **Multi-scale Convolution-enhanced Adaptation:** A parameter-efficient adaptation method is used to adapt pre-trained models for VPR, introducing multi-scale local priors.
**Experiments:**
- **Datasets and Performance Evaluation:** CricaVPR is evaluated on several VPR benchmark datasets, including Pitts30k, MSLS, and Tokyo24/7, showing superior performance.
- **Implementation Details:** Training details and hyperparameters are provided.
- **Comparison with State-of-the-Art Methods:** CricaVPR outperforms several state-of-the-art methods in terms of Recall@1.
- **Ablation Study:** Ablation experiments validate the effectiveness of the proposed components.
**Conclusions:**
CricaVPR provides a robust global representation for VPR, addressing various challenges with cross-image correlation and multi-scale convolution-enhanced adaptation.CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
**Abstract:**
Visual place recognition (VPR) aims to identify the geographical location of an input image using a geo-tagged database. Traditional methods often rely on global features, which can be robust to viewpoint and condition changes but struggle with perceptual aliasing. This paper introduces CricaVPR, a novel method that leverages cross-image correlation to enhance the robustness of VPR. CricaVPR uses an attention mechanism to correlate multiple images within a batch, allowing each image to learn from others, thus producing more invariant and discriminative features. Additionally, a multi-scale convolution-enhanced adaptation method is proposed to adapt pre-trained visual foundation models for VPR, introducing multi-scale local information to improve performance. Experimental results on benchmark datasets show that CricaVPR outperforms state-of-the-art methods with significantly less training time.
**Contributions:**
1. **Cross-image Correlation-aware Representation:** CricaVPR uses attention to correlate multiple images within a batch, enhancing the robustness of each feature.
2. **Multi-scale Convolution-enhanced Adaptation:** A parameter-efficient adaptation method is designed to adapt pre-trained models for VPR, introducing multi-scale local priors.
3. **Performance:** Extensive experiments demonstrate that CricaVPR outperforms state-of-the-art methods on various VPR datasets, achieving higher Recall@1 with less training time.
**Related Work:**
VPR methods have traditionally used global features, but these often lack robustness to challenging environments and perceptual aliasing. CricaVPR addresses these issues by leveraging cross-image correlation and multi-scale convolution-enhanced adaptation.
**Methodology:**
- **Cross-image Correlation-aware Place Representation:** CricaVPR uses a cross-image encoder to correlate multiple image representations, enhancing the robustness of each feature.
- **Multi-scale Convolution-enhanced Adaptation:** A parameter-efficient adaptation method is used to adapt pre-trained models for VPR, introducing multi-scale local priors.
**Experiments:**
- **Datasets and Performance Evaluation:** CricaVPR is evaluated on several VPR benchmark datasets, including Pitts30k, MSLS, and Tokyo24/7, showing superior performance.
- **Implementation Details:** Training details and hyperparameters are provided.
- **Comparison with State-of-the-Art Methods:** CricaVPR outperforms several state-of-the-art methods in terms of Recall@1.
- **Ablation Study:** Ablation experiments validate the effectiveness of the proposed components.
**Conclusions:**
CricaVPR provides a robust global representation for VPR, addressing various challenges with cross-image correlation and multi-scale convolution-enhanced adaptation.