BoQ: A Place is Worth a Bag of Learnable Queries

BoQ: A Place is Worth a Bag of Learnable Queries

12 May 2024 | Amar Ali-bey*, Brahim Chaib-draa, Philippe Giguère
The paper introduces a novel technique called Bag-of-Queries (BoQ) for visual place recognition (VPR), which learns a set of global queries to capture universal place-specific attributes. Unlike existing techniques that use self-attention and generate queries directly from the input, BoQ employs distinct learnable global queries that probe the input features via cross-attention, ensuring consistent information aggregation. BoQ integrates with both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) and provides an interpretable attention mechanism. Extensive experiments on 14 large-scale benchmarks demonstrate that BoQ consistently outperforms state-of-the-art techniques, including NetVLAD, MixVPR, and EigenPlaces, while being orders of magnitude faster and more efficient. BoQ surpasses two-stage retrieval methods like Patch-NetVLAD, TransVPR, and R2Former, making it suitable for applications with limited computational resources but high accuracy and efficiency requirements. The code and model weights are publicly available.The paper introduces a novel technique called Bag-of-Queries (BoQ) for visual place recognition (VPR), which learns a set of global queries to capture universal place-specific attributes. Unlike existing techniques that use self-attention and generate queries directly from the input, BoQ employs distinct learnable global queries that probe the input features via cross-attention, ensuring consistent information aggregation. BoQ integrates with both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) and provides an interpretable attention mechanism. Extensive experiments on 14 large-scale benchmarks demonstrate that BoQ consistently outperforms state-of-the-art techniques, including NetVLAD, MixVPR, and EigenPlaces, while being orders of magnitude faster and more efficient. BoQ surpasses two-stage retrieval methods like Patch-NetVLAD, TransVPR, and R2Former, making it suitable for applications with limited computational resources but high accuracy and efficiency requirements. The code and model weights are publicly available.
Reach us at info@study.space