January 12, 2012 | Sariel Har-Peled, Piotr Indyk, Rajeev Motwani
The paper presents two algorithms for the approximate nearest neighbor problem in high-dimensional spaces. For datasets of size \( n \) in \( \mathbb{R}^d \), the algorithms require space that is polynomial in \( n \) and \( d \), while achieving query times that are sub-linear in \( n \) and polynomial in \( d \). The authors also demonstrate applications to other high-dimensional geometric problems, such as the approximate minimum spanning tree.
The nearest neighbor (NN) problem involves finding the closest point in a set \( P \) of \( n \) points to a query point \( q \) in a metric space. The problem is crucial in various fields, including data compression, databases, machine learning, and signal processing. However, the curse of dimensionality makes exact solutions inefficient, especially as the dimension \( d \) increases.
To address this, the paper introduces two main results:
1. A deterministic data structure for \((1 + \gamma)(1 + \epsilon)\)-NN queries, with space and preprocessing time bounded by \( O(n \log^2 n) \times O(1/\epsilon)^d \) and query time \( O(d \log n) \).
2. A randomized data structure for \((1 + \gamma)(1 + \epsilon)\)-NN queries, with space and preprocessing time bounded by \( n^{O(\log(1/\epsilon)/\epsilon^2)} \) and query time polynomial in \( d \), \( \log n \), and \( 1/\epsilon \).
The algorithms are based on locality-sensitive hashing (LSH) and approximate Voronoi decompositions. The paper also discusses the reduction from the approximate nearest neighbor problem to the approximate near neighbor problem, which is crucial for constructing efficient data structures. The results complement each other, providing both theoretical and practical improvements in handling high-dimensional data.The paper presents two algorithms for the approximate nearest neighbor problem in high-dimensional spaces. For datasets of size \( n \) in \( \mathbb{R}^d \), the algorithms require space that is polynomial in \( n \) and \( d \), while achieving query times that are sub-linear in \( n \) and polynomial in \( d \). The authors also demonstrate applications to other high-dimensional geometric problems, such as the approximate minimum spanning tree.
The nearest neighbor (NN) problem involves finding the closest point in a set \( P \) of \( n \) points to a query point \( q \) in a metric space. The problem is crucial in various fields, including data compression, databases, machine learning, and signal processing. However, the curse of dimensionality makes exact solutions inefficient, especially as the dimension \( d \) increases.
To address this, the paper introduces two main results:
1. A deterministic data structure for \((1 + \gamma)(1 + \epsilon)\)-NN queries, with space and preprocessing time bounded by \( O(n \log^2 n) \times O(1/\epsilon)^d \) and query time \( O(d \log n) \).
2. A randomized data structure for \((1 + \gamma)(1 + \epsilon)\)-NN queries, with space and preprocessing time bounded by \( n^{O(\log(1/\epsilon)/\epsilon^2)} \) and query time polynomial in \( d \), \( \log n \), and \( 1/\epsilon \).
The algorithms are based on locality-sensitive hashing (LSH) and approximate Voronoi decompositions. The paper also discusses the reduction from the approximate nearest neighbor problem to the approximate near neighbor problem, which is crucial for constructing efficient data structures. The results complement each other, providing both theoretical and practical improvements in handling high-dimensional data.