Understanding Multi-Probe LSH%3A Efficient Indexing for High-Dimensional Similarity Search

This paper proposes a new indexing scheme called multi-probe LSH for efficient similarity search in high-dimensional data. The method improves upon existing LSH techniques by using a carefully derived probing sequence to check multiple hash buckets that are likely to contain the nearest neighbors of a query object. This approach reduces the number of hash tables required while maintaining similar query efficiency. The multi-probe LSH method is evaluated on two datasets: an image dataset with 1.3 million web images and an audio dataset with 2.6 million words. The results show that multi-probe LSH significantly improves upon basic and entropy-based LSH methods in terms of space and time efficiency. It reduces the number of hash tables by a factor of 14 to 18 while achieving similar query time. Compared to entropy-based LSH, it reduces space requirements by a factor of 5 to 8 and uses less query time. The method also outperforms entropy-based LSH in terms of probe efficiency, requiring fewer probes to achieve the same search quality. The multi-probe LSH method uses a query-directed probing sequence that is more efficient than step-wise probing. The results show that the query-directed probing sequence requires significantly fewer hash tables and probes to achieve the same search quality. The method is also less sensitive to the number of nearest neighbors (K) compared to other methods. The paper concludes that multi-probe LSH is a more efficient and effective method for high-dimensional similarity search.This paper proposes a new indexing scheme called multi-probe LSH for efficient similarity search in high-dimensional data. The method improves upon existing LSH techniques by using a carefully derived probing sequence to check multiple hash buckets that are likely to contain the nearest neighbors of a query object. This approach reduces the number of hash tables required while maintaining similar query efficiency. The multi-probe LSH method is evaluated on two datasets: an image dataset with 1.3 million web images and an audio dataset with 2.6 million words. The results show that multi-probe LSH significantly improves upon basic and entropy-based LSH methods in terms of space and time efficiency. It reduces the number of hash tables by a factor of 14 to 18 while achieving similar query time. Compared to entropy-based LSH, it reduces space requirements by a factor of 5 to 8 and uses less query time. The method also outperforms entropy-based LSH in terms of probe efficiency, requiring fewer probes to achieve the same search quality. The multi-probe LSH method uses a query-directed probing sequence that is more efficient than step-wise probing. The results show that the query-directed probing sequence requires significantly fewer hash tables and probes to achieve the same search quality. The method is also less sensitive to the number of nearest neighbors (K) compared to other methods. The paper concludes that multi-probe LSH is a more efficient and effective method for high-dimensional similarity search.

Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search

September 23-28, 2007 | Qin Lv, William Josephson, Zhe Wang, Moses Charikar, Kai Li