Understanding Similarity estimation techniques from rounding algorithms

The paper explores the connection between rounding algorithms used in approximation algorithms and the construction of locality sensitive hash functions (LSH) for various similarity measures. LSH schemes are distributions on a family of hash functions that map objects to a compact representation, allowing for efficient estimation of similarity between objects. The author shows that rounding algorithms for linear programs (LPs) and semidefinite programs (SDPs) can be used to derive LSH schemes for several interesting collections of objects, including vectors and distributions on metric spaces. Key contributions include: 1. **Rounding Algorithms for LPs and SDPs**: The paper demonstrates that rounding algorithms used in approximation algorithms can be viewed as LSH schemes for certain similarity measures. 2. **New LSH Schemes**: - For vectors, a new LSH scheme is constructed to estimate the cosine similarity measure between vectors, which is closely related to the angle between vectors. - For distributions on metric spaces, an LSH scheme is developed to estimate the Earth Mover Distance (EMD), a popular metric in graphics and vision. 3. **Applications**: - The paper provides efficient algorithms for approximate nearest neighbor search and clustering using these LSH schemes. - It also discusses the reduction of the Hamming space problem to LSH, achieving the same performance as previous methods. The paper concludes by highlighting the relationship between rounding algorithms and LSH, suggesting further research directions, such as estimating information-theoretic measures like KL-divergence.The paper explores the connection between rounding algorithms used in approximation algorithms and the construction of locality sensitive hash functions (LSH) for various similarity measures. LSH schemes are distributions on a family of hash functions that map objects to a compact representation, allowing for efficient estimation of similarity between objects. The author shows that rounding algorithms for linear programs (LPs) and semidefinite programs (SDPs) can be used to derive LSH schemes for several interesting collections of objects, including vectors and distributions on metric spaces. Key contributions include: 1. **Rounding Algorithms for LPs and SDPs**: The paper demonstrates that rounding algorithms used in approximation algorithms can be viewed as LSH schemes for certain similarity measures. 2. **New LSH Schemes**: - For vectors, a new LSH scheme is constructed to estimate the cosine similarity measure between vectors, which is closely related to the angle between vectors. - For distributions on metric spaces, an LSH scheme is developed to estimate the Earth Mover Distance (EMD), a popular metric in graphics and vision. 3. **Applications**: - The paper provides efficient algorithms for approximate nearest neighbor search and clustering using these LSH schemes. - It also discusses the reduction of the Hamming space problem to LSH, achieving the same performance as previous methods. The paper concludes by highlighting the relationship between rounding algorithms and LSH, suggesting further research directions, such as estimating information-theoretic measures like KL-divergence.

Similarity Estimation Techniques from Rounding Algorithms

May 19-21, 2002 | Moses S. Charikar