[slides] Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration

This paper presents a system for automatically selecting the fastest approximate nearest-neighbor algorithm for a given dataset. The system takes a dataset and a desired precision level and automatically determines the best algorithm and parameter values. The authors also describe two new algorithms: one that applies priority search on hierarchical k-means trees, and another that uses multiple randomized kd-trees. These algorithms are shown to provide significant improvements in query time over existing methods. The nearest neighbor search problem is particularly challenging in high-dimensional spaces, where exact algorithms are not faster than linear search. Approximate algorithms can provide large speedups with only minor loss in accuracy. However, many such algorithms lack guidance on selecting the best algorithm and parameters for a given dataset. The authors compare various algorithms for approximate nearest neighbor search on datasets with a wide range of dimensionality. They find that the best algorithm depends on the dataset and desired precision. For some datasets, the hierarchical k-means tree provides the best performance, while for others, multiple randomized kd-trees are more effective. The authors also introduce an algorithm that modifies the previous method of using hierarchical k-means trees. This algorithm uses a priority queue to expand the search in order according to the distance of each k-means domain from the query. This approach is shown to provide the best known performance on many datasets. For other datasets, the authors find that an algorithm using multiple randomized kd-trees provides the best results. This algorithm has been proposed recently and has not been widely tested. The authors show that once optimal parameter values are determined, this algorithm often gives an order of magnitude improvement compared to the best previous method that used a single kd-tree. The authors also describe an automatic algorithm selection and configuration approach that allows the best algorithm and parameter settings to be determined automatically for any given dataset. This approach allows for easy extension if other algorithms are identified in the future that provide superior performance for particular datasets. The authors demonstrate their approach by matching image patches to a database of 100,000 other patches taken under different illumination and imaging conditions. In their experiments, they show that it is possible to obtain a speed-up by a factor of 1,000 times relative to linear search while still identifying 95% of the correct nearest neighbors.This paper presents a system for automatically selecting the fastest approximate nearest-neighbor algorithm for a given dataset. The system takes a dataset and a desired precision level and automatically determines the best algorithm and parameter values. The authors also describe two new algorithms: one that applies priority search on hierarchical k-means trees, and another that uses multiple randomized kd-trees. These algorithms are shown to provide significant improvements in query time over existing methods. The nearest neighbor search problem is particularly challenging in high-dimensional spaces, where exact algorithms are not faster than linear search. Approximate algorithms can provide large speedups with only minor loss in accuracy. However, many such algorithms lack guidance on selecting the best algorithm and parameters for a given dataset. The authors compare various algorithms for approximate nearest neighbor search on datasets with a wide range of dimensionality. They find that the best algorithm depends on the dataset and desired precision. For some datasets, the hierarchical k-means tree provides the best performance, while for others, multiple randomized kd-trees are more effective. The authors also introduce an algorithm that modifies the previous method of using hierarchical k-means trees. This algorithm uses a priority queue to expand the search in order according to the distance of each k-means domain from the query. This approach is shown to provide the best known performance on many datasets. For other datasets, the authors find that an algorithm using multiple randomized kd-trees provides the best results. This algorithm has been proposed recently and has not been widely tested. The authors show that once optimal parameter values are determined, this algorithm often gives an order of magnitude improvement compared to the best previous method that used a single kd-tree. The authors also describe an automatic algorithm selection and configuration approach that allows the best algorithm and parameter settings to be determined automatically for any given dataset. This approach allows for easy extension if other algorithms are identified in the future that provide superior performance for particular datasets. The authors demonstrate their approach by matching image patches to a database of 100,000 other patches taken under different illumination and imaging conditions. In their experiments, they show that it is possible to obtain a speed-up by a factor of 1,000 times relative to linear search while still identifying 95% of the correct nearest neighbors.

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION

| Marius Muja, David G. Lowe