AN ALGORITHM FOR FINDING BEST MATCHES IN LOGARITHMIC EXPECTED TIME

AN ALGORITHM FOR FINDING BEST MATCHES IN LOGARITHMIC EXPECTED TIME

February 1975, Revised December 1975, Revised July 1976 | Jerome H. Friedman, Jon Louis Bentley, Raphsel Ari Finkel
The paper presents an algorithm and data structure for efficiently searching a file containing \( N \) records, each described by \( k \) real-valued keys, to find the \( m \) closest matches or nearest neighbors to a given query record. The algorithm is designed to minimize the expected number of records examined and the expected computation time, which is proportional to \( \log N \). The data structure, an optimized k-d tree, partitions the record space into hypercubical subregions, each containing a nearly equal number of records. The construction of the k-d tree requires \( kN \log N \) time, and the search time is logarithmic in the file size. The algorithm is applicable to various dissimilarity measures and does not require the triangle inequality. Empirical evidence suggests that the algorithm is significantly faster than other methods, especially for large files and higher dimensions. The paper also discusses the implementation and comparison with other methods, including a sorting algorithm, and provides insights into the performance of the algorithm under different conditions.The paper presents an algorithm and data structure for efficiently searching a file containing \( N \) records, each described by \( k \) real-valued keys, to find the \( m \) closest matches or nearest neighbors to a given query record. The algorithm is designed to minimize the expected number of records examined and the expected computation time, which is proportional to \( \log N \). The data structure, an optimized k-d tree, partitions the record space into hypercubical subregions, each containing a nearly equal number of records. The construction of the k-d tree requires \( kN \log N \) time, and the search time is logarithmic in the file size. The algorithm is applicable to various dissimilarity measures and does not require the triangle inequality. Empirical evidence suggests that the algorithm is significantly faster than other methods, especially for large files and higher dimensions. The paper also discusses the implementation and comparison with other methods, including a sorting algorithm, and provides insights into the performance of the algorithm under different conditions.
Reach us at info@study.space