Similarity Search in High Dimensions via Hashing

Similarity Search in High Dimensions via Hashing

1999 | ARISTIDES GIONIS, PIOTR Indyk, RAJEEV MOTWANI
The paper "Similarity Search in High Dimensions via Hashing" by Aristides Gionis, Piotr Indyk, and Rajeev Motwani addresses the challenge of performing similarity searches in high-dimensional spaces, which are common in applications such as image databases, document collections, and genome databases. The authors propose a novel scheme based on locality-sensitive hashing (LSH) to overcome the "curse of dimensionality" that plagues traditional methods like k-d trees. LSH hashes points in a way that increases the probability of collisions for points that are close to each other, allowing for efficient retrieval of approximate nearest neighbors. The paper provides theoretical guarantees and empirical evidence showing that LSH outperforms other methods, achieving sublinear time complexity in high dimensions and scaling well with data size and dimensionality. The authors also compare LSH with the Sphere/Rectangle-tree (SR-tree) data structure, demonstrating that LSH is significantly faster and more scalable. The paper concludes by discussing potential future improvements and applications of LSH in data mining and search for copyrighted video data.The paper "Similarity Search in High Dimensions via Hashing" by Aristides Gionis, Piotr Indyk, and Rajeev Motwani addresses the challenge of performing similarity searches in high-dimensional spaces, which are common in applications such as image databases, document collections, and genome databases. The authors propose a novel scheme based on locality-sensitive hashing (LSH) to overcome the "curse of dimensionality" that plagues traditional methods like k-d trees. LSH hashes points in a way that increases the probability of collisions for points that are close to each other, allowing for efficient retrieval of approximate nearest neighbors. The paper provides theoretical guarantees and empirical evidence showing that LSH outperforms other methods, achieving sublinear time complexity in high dimensions and scaling well with data size and dimensionality. The authors also compare LSH with the Sphere/Rectangle-tree (SR-tree) data structure, demonstrating that LSH is significantly faster and more scalable. The paper concludes by discussing potential future improvements and applications of LSH in data mining and search for copyrighted video data.
Reach us at info@study.space
[slides and audio] Similarity Search in High Dimensions via Hashing