CrivaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

CrivaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

1 Apr 2024 | Feng Lu, Xiangyuan Lan, Lijun Zhang, Dongmei Jiang, Yaowei Wang, Chun Yuan
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition This paper proposes CricaVPR, a robust global representation method for visual place recognition (VPR) that incorporates cross-image correlation awareness. The method uses an attention mechanism to correlate multiple images within a batch, enabling the model to harvest useful information from other images to enhance robustness. A multi-scale convolution-enhanced adaptation method is also introduced to adapt pre-trained visual foundation models to the VPR task, introducing multi-scale local information to further enhance cross-image correlation-aware representation. Experimental results show that CricaVPR outperforms state-of-the-art methods by a large margin with significantly less training time. The code is available at https://github.com/Lu-Feng/CricaVPR. The paper addresses three key challenges in VPR: condition variations, viewpoint variations, and perceptual aliasing. Traditional methods often produce global features without considering cross-image variations, leading to limited robustness. CricaVPR uses cross-image correlation awareness to guide representation learning, producing more robust features. The method also introduces a multi-scale convolution-enhanced adaptation method to adapt pre-trained models for VPR, improving performance. The method involves a Vision Transformer (ViT) and the attention mechanism used in it. The cross-image correlation-aware representation method is used to describe place images, and a multi-scale convolution-enhanced adaptation method is used to adapt the foundation model for VPR. The training strategy for fine-tuning is also presented. The paper also presents an ablation study on the effectiveness of the proposed components. The results show that the cross-image correlation awareness significantly improves performance. The method is evaluated on several VPR benchmark datasets, demonstrating its effectiveness in addressing various challenges in VPR. The paper concludes that CricaVPR provides a robust global representation for VPR, outperforming state-of-the-art methods by a significant margin. The method is efficient in terms of training time and data efficiency, making it suitable for real-world applications.CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition This paper proposes CricaVPR, a robust global representation method for visual place recognition (VPR) that incorporates cross-image correlation awareness. The method uses an attention mechanism to correlate multiple images within a batch, enabling the model to harvest useful information from other images to enhance robustness. A multi-scale convolution-enhanced adaptation method is also introduced to adapt pre-trained visual foundation models to the VPR task, introducing multi-scale local information to further enhance cross-image correlation-aware representation. Experimental results show that CricaVPR outperforms state-of-the-art methods by a large margin with significantly less training time. The code is available at https://github.com/Lu-Feng/CricaVPR. The paper addresses three key challenges in VPR: condition variations, viewpoint variations, and perceptual aliasing. Traditional methods often produce global features without considering cross-image variations, leading to limited robustness. CricaVPR uses cross-image correlation awareness to guide representation learning, producing more robust features. The method also introduces a multi-scale convolution-enhanced adaptation method to adapt pre-trained models for VPR, improving performance. The method involves a Vision Transformer (ViT) and the attention mechanism used in it. The cross-image correlation-aware representation method is used to describe place images, and a multi-scale convolution-enhanced adaptation method is used to adapt the foundation model for VPR. The training strategy for fine-tuning is also presented. The paper also presents an ablation study on the effectiveness of the proposed components. The results show that the cross-image correlation awareness significantly improves performance. The method is evaluated on several VPR benchmark datasets, demonstrating its effectiveness in addressing various challenges in VPR. The paper concludes that CricaVPR provides a robust global representation for VPR, outperforming state-of-the-art methods by a significant margin. The method is efficient in terms of training time and data efficiency, making it suitable for real-world applications.
Reach us at info@study.space
[slides] CricaVPR%3A Cross-Image Correlation-Aware Representation Learning for Visual Place Recognition | StudySpace