June 9–11, 2004, NewYork, USA | Mayur Datar, Nicole Immorlica, Piotr Indyk, Vahab S. Mirrokni
This paper presents a novel Locality-Sensitive Hashing (LSH) scheme for the Approximate Nearest Neighbor (ANN) problem under the $l_p$ norm, based on $p$-stable distributions. The scheme improves the running time of earlier algorithms for the $l_2$ norm and provides the first provably efficient AN algorithm for $p < 1$. It also guarantees finding the exact near neighbor in $O(\log n)$ time for data satisfying a "bounded growth" condition. Unlike previous schemes, this LSH works directly on points in Euclidean space without embeddings, resulting in simpler and faster query times. Experimental results on synthetic datasets show that the proposed algorithm is up to 40 times faster than the $kd$-tree algorithm. The paper includes theoretical and computational analyses, as well as empirical evaluations, demonstrating the effectiveness and efficiency of the proposed LSH scheme.This paper presents a novel Locality-Sensitive Hashing (LSH) scheme for the Approximate Nearest Neighbor (ANN) problem under the $l_p$ norm, based on $p$-stable distributions. The scheme improves the running time of earlier algorithms for the $l_2$ norm and provides the first provably efficient AN algorithm for $p < 1$. It also guarantees finding the exact near neighbor in $O(\log n)$ time for data satisfying a "bounded growth" condition. Unlike previous schemes, this LSH works directly on points in Euclidean space without embeddings, resulting in simpler and faster query times. Experimental results on synthetic datasets show that the proposed algorithm is up to 40 times faster than the $kd$-tree algorithm. The paper includes theoretical and computational analyses, as well as empirical evaluations, demonstrating the effectiveness and efficiency of the proposed LSH scheme.