Similarity Search in High Dimensions via Hashing

Similarity Search in High Dimensions via Hashing

1999 | ARISTIDES GIONIS, PIOTR INDYK, RAJEEV MOTWANI
This paper presents a novel method for approximate similarity search in high-dimensional spaces using locality-sensitive hashing (LSH). The authors address the challenge of efficiently performing similarity searches in high-dimensional data, where traditional methods like k-d trees suffer from the "curse of dimensionality." LSH is introduced as a technique that hashes points in such a way that the probability of collision is higher for points that are close to each other than for those that are far apart. This allows for efficient retrieval of approximate nearest neighbors. The paper describes an algorithm that uses LSH to index data, enabling fast approximate similarity searches. The algorithm is tested on two data sets: one consisting of color histograms and another containing texture features from aerial photographs. The results show that the LSH method significantly outperforms the SR-tree in terms of speed, especially for large data sets and high-dimensional spaces. The method is also shown to scale well with increasing data size and dimensionality. The authors demonstrate that by allowing a small error margin, the query time can be significantly improved. The algorithm is efficient and can be implemented with a running time that is sublinear in the data size. The paper also discusses the theoretical foundations of LSH, including its application to various similarity measures and its performance in different scenarios. The results indicate that LSH is a promising approach for high-performance similarity search in high-dimensional spaces.This paper presents a novel method for approximate similarity search in high-dimensional spaces using locality-sensitive hashing (LSH). The authors address the challenge of efficiently performing similarity searches in high-dimensional data, where traditional methods like k-d trees suffer from the "curse of dimensionality." LSH is introduced as a technique that hashes points in such a way that the probability of collision is higher for points that are close to each other than for those that are far apart. This allows for efficient retrieval of approximate nearest neighbors. The paper describes an algorithm that uses LSH to index data, enabling fast approximate similarity searches. The algorithm is tested on two data sets: one consisting of color histograms and another containing texture features from aerial photographs. The results show that the LSH method significantly outperforms the SR-tree in terms of speed, especially for large data sets and high-dimensional spaces. The method is also shown to scale well with increasing data size and dimensionality. The authors demonstrate that by allowing a small error margin, the query time can be significantly improved. The algorithm is efficient and can be implemented with a running time that is sublinear in the data size. The paper also discusses the theoretical foundations of LSH, including its application to various similarity measures and its performance in different scenarios. The results indicate that LSH is a promising approach for high-performance similarity search in high-dimensional spaces.
Reach us at info@study.space