January 12, 2012 | Sariel Har-Peled, Piotr Indyk, Rajeev Motwani
This paper presents two algorithms for the approximate nearest neighbor problem in high-dimensional spaces. The algorithms require space that is polynomial in the number of data points $ n $ and the dimension $ d $, while achieving query times that are sub-linear in $ n $ and polynomial in $ d $. The results are applied to other high-dimensional geometric problems, such as the approximate minimum spanning tree.
The nearest neighbor (NN) problem involves finding the point in a set $ P $ that is closest to a query point $ q $. In high-dimensional spaces, traditional algorithms become inefficient due to the "curse of dimensionality," where space and time requirements grow exponentially with the dimension. The paper addresses this by introducing approximate nearest neighbor algorithms that allow for a certain error margin, enabling efficient solutions even in high dimensions.
The first algorithm provides a deterministic data structure for $ (1+\gamma)(1+\epsilon) $-NN queries with space and preprocessing time bounded by $ O(n \log^2 n) \times O(1/\epsilon)^d $ and query time $ O(d \log n) $. The second algorithm is a randomized data structure with space subquadratic in $ n $ and query time sub-linear in $ n $, using $ O(dn + n^{1+1/c} \log^2 n \log \log n) $ space and $ O(dn^{1/c} \log^2 n \log \log n) $ query time. These algorithms are based on Locality-Sensitive Hashing.
The results are extended to $ \ell_s $ norms for $ s \in [1,2] $ using an embedding from $ \ell_s^d $ into $ \ell_1^{d'} $, which preserves distances by a factor of $ 1 + \epsilon $. The algorithms are shown to provide efficient approximate Voronoi decompositions and are used to solve other proximity problems, such as the approximate minimum spanning tree (MST).
The paper also discusses related work, including prior algorithms for approximate nearest neighbor, and presents a reduction from the approximate nearest neighbor problem to the near neighbor problem. This reduction is based on a combination of approximate Voronoi decomposition and locality-sensitive hashing. The algorithms are shown to be efficient and practical for high-dimensional data, with the second algorithm being more suitable for practical applications due to its lower space penalty.This paper presents two algorithms for the approximate nearest neighbor problem in high-dimensional spaces. The algorithms require space that is polynomial in the number of data points $ n $ and the dimension $ d $, while achieving query times that are sub-linear in $ n $ and polynomial in $ d $. The results are applied to other high-dimensional geometric problems, such as the approximate minimum spanning tree.
The nearest neighbor (NN) problem involves finding the point in a set $ P $ that is closest to a query point $ q $. In high-dimensional spaces, traditional algorithms become inefficient due to the "curse of dimensionality," where space and time requirements grow exponentially with the dimension. The paper addresses this by introducing approximate nearest neighbor algorithms that allow for a certain error margin, enabling efficient solutions even in high dimensions.
The first algorithm provides a deterministic data structure for $ (1+\gamma)(1+\epsilon) $-NN queries with space and preprocessing time bounded by $ O(n \log^2 n) \times O(1/\epsilon)^d $ and query time $ O(d \log n) $. The second algorithm is a randomized data structure with space subquadratic in $ n $ and query time sub-linear in $ n $, using $ O(dn + n^{1+1/c} \log^2 n \log \log n) $ space and $ O(dn^{1/c} \log^2 n \log \log n) $ query time. These algorithms are based on Locality-Sensitive Hashing.
The results are extended to $ \ell_s $ norms for $ s \in [1,2] $ using an embedding from $ \ell_s^d $ into $ \ell_1^{d'} $, which preserves distances by a factor of $ 1 + \epsilon $. The algorithms are shown to provide efficient approximate Voronoi decompositions and are used to solve other proximity problems, such as the approximate minimum spanning tree (MST).
The paper also discusses related work, including prior algorithms for approximate nearest neighbor, and presents a reduction from the approximate nearest neighbor problem to the near neighbor problem. This reduction is based on a combination of approximate Voronoi decomposition and locality-sensitive hashing. The algorithms are shown to be efficient and practical for high-dimensional data, with the second algorithm being more suitable for practical applications due to its lower space penalty.