NetVLAD: CNN architecture for weakly supervised place recognition

NetVLAD: CNN architecture for weakly supervised place recognition

2 May 2016 | Relja Arandjelović, Petr Gronat, Akihiko Torii, Tomas Pajdl, Josef Sivic
The paper presents a novel convolutional neural network (CNN) architecture, NetVLAD, designed for weakly supervised place recognition. The main contributions are threefold: (1) the development of NetVLAD, a generalized VLAD layer inspired by the Vector of Locally Aggregated Descriptors (VLAD) commonly used in image retrieval; (2) the creation of a training procedure based on a weakly supervised ranking loss to learn the architecture parameters from images depicting the same places over time from Google Street View Time Machine; and (3) the demonstration that the proposed architecture significantly outperforms non-learned image representations and off-the-shelf CNN descriptors on challenging place recognition benchmarks and standard image retrieval benchmarks. The NetVLAD layer is pluggable into any CNN architecture and can be trained via backpropagation. The weakly supervised ranking loss allows for end-to-end learning, making it suitable for tasks with limited labeled data. The method is evaluated on two datasets, Pittsburgh (Pitts250k) and Tokyo 24/7, showing substantial improvements over baselines and state-of-the-art methods.The paper presents a novel convolutional neural network (CNN) architecture, NetVLAD, designed for weakly supervised place recognition. The main contributions are threefold: (1) the development of NetVLAD, a generalized VLAD layer inspired by the Vector of Locally Aggregated Descriptors (VLAD) commonly used in image retrieval; (2) the creation of a training procedure based on a weakly supervised ranking loss to learn the architecture parameters from images depicting the same places over time from Google Street View Time Machine; and (3) the demonstration that the proposed architecture significantly outperforms non-learned image representations and off-the-shelf CNN descriptors on challenging place recognition benchmarks and standard image retrieval benchmarks. The NetVLAD layer is pluggable into any CNN architecture and can be trained via backpropagation. The weakly supervised ranking loss allows for end-to-end learning, making it suitable for tasks with limited labeled data. The method is evaluated on two datasets, Pittsburgh (Pitts250k) and Tokyo 24/7, showing substantial improvements over baselines and state-of-the-art methods.
Reach us at info@study.space