Understanding Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

The paper introduces a novel method, SelaVPR, to seamlessly adapt pre-trained foundation models for visual place recognition (VPR). The method addresses the gap between pre-training and VPR tasks by designing a hybrid global-local adaptation approach. Specifically, it tunes lightweight adapters to the frozen pre-trained model to achieve both global and local feature extraction. The global adaptation is achieved by adding adapters after the multi-head attention layer and in parallel to the MLP layer in each transformer block, while the local adaptation is implemented by up-convolutional layers to upsample the feature map. Additionally, a mutual nearest neighbor local feature loss is proposed to guide effective adaptation and avoid time-consuming spatial verification in re-ranking. Experimental results show that SelaVPR outperforms state-of-the-art methods on several VPR benchmarks, using less training data and time, and achieving 3% of the retrieval runtime of two-stage VPR methods with RANSAC-based spatial verification. The method ranks 1st on the MSLS challenge leaderboard at the time of submission.The paper introduces a novel method, SelaVPR, to seamlessly adapt pre-trained foundation models for visual place recognition (VPR). The method addresses the gap between pre-training and VPR tasks by designing a hybrid global-local adaptation approach. Specifically, it tunes lightweight adapters to the frozen pre-trained model to achieve both global and local feature extraction. The global adaptation is achieved by adding adapters after the multi-head attention layer and in parallel to the MLP layer in each transformer block, while the local adaptation is implemented by up-convolutional layers to upsample the feature map. Additionally, a mutual nearest neighbor local feature loss is proposed to guide effective adaptation and avoid time-consuming spatial verification in re-ranking. Experimental results show that SelaVPR outperforms state-of-the-art methods on several VPR benchmarks, using less training data and time, and achieving 3% of the retrieval runtime of two-stage VPR methods with RANSAC-based spatial verification. The method ranks 1st on the MSLS challenge leaderboard at the time of submission.

TOWARDS SEAMLESS ADAPTATION OF PRE-TRAINED MODELS FOR VISUAL PLACE RECOGNITION

3 Apr 2024 | Feng Lu1,2, Lijun Zhang3, Xiangyuan Lan2, Shuting Dong1, Yaowei Wang2, Chun Yuan1

TOWARDS SEAMLESS ADAPTATION OF PRE-TRAINED MODELS FOR VISUAL PLACE RECOGNITION

3 Apr 2024 | Feng Lu1,2, Lijun Zhang3, Xiangyuan Lan2*, Shuting Dong1, Yaowei Wang2, Chun Yuan1*

3 Apr 2024 | Feng Lu1,2, Lijun Zhang3, Xiangyuan Lan2, Shuting Dong1, Yaowei Wang2, Chun Yuan1