A Survey of Recent Advances in Hierarchical Clustering Algorithms

A Survey of Recent Advances in Hierarchical Clustering Algorithms

1983 | F. Murtagh
This survey discusses recent advances in hierarchical clustering algorithms, emphasizing the use of efficient nearest neighbor searching to improve their performance. Hierarchical clustering algorithms traditionally require O(N²) time due to pairwise interobject proximities, but recent work has shown that this can be improved by incorporating efficient nearest neighbor techniques. The paper presents a general framework for agglomerative hierarchical clustering, highlighting new algorithmic approaches and recent results. The paper begins by discussing the challenges of hierarchical clustering, including the chaining effect and lack of cluster centers. It then reviews existing algorithms, such as Sibson's single linkage and Defays' complete linkage methods, which require all pairwise dissimilarities. The paper then introduces algorithms that focus on finding nearest neighbors, which are crucial for agglomeration steps. These algorithms are discussed in the context of the nearest neighbor graph (NN-graph), which helps identify reciprocal nearest neighbors (RNNs) and their properties. The reducibility property of RNNs is discussed, which allows for efficient merging of clusters without affecting the RNN properties of other parts of the graph. This property is used in the multiple cluster algorithm (Algorithm 3) for geometric clustering methods, which reduces the set of points by replacing RNNs with cluster centers. However, this algorithm has a worst-case time complexity of O(N³), which is improved by the single cluster algorithm (Algorithm 4), which uses NN-chains to grow clusters efficiently. The paper also discusses the construction of minimal spanning trees (MSTs) and their use in single linkage clustering. It presents algorithms for constructing MSTs using subgraphs and discusses the use of efficient nearest neighbor finding techniques, such as those based on multidimensional binary search trees and branch and bound methods. These techniques help reduce the computational complexity of nearest neighbor searches. The paper concludes by emphasizing the importance of nearest neighbor concepts in hierarchical clustering and the need for further research into efficient algorithms for high-dimensional spaces. It also highlights the versatility of subgraph approaches in incorporating various nearest neighbor finding techniques. Overall, the survey shows that recent advances in nearest neighbor searching have significantly improved the efficiency of hierarchical clustering algorithms.This survey discusses recent advances in hierarchical clustering algorithms, emphasizing the use of efficient nearest neighbor searching to improve their performance. Hierarchical clustering algorithms traditionally require O(N²) time due to pairwise interobject proximities, but recent work has shown that this can be improved by incorporating efficient nearest neighbor techniques. The paper presents a general framework for agglomerative hierarchical clustering, highlighting new algorithmic approaches and recent results. The paper begins by discussing the challenges of hierarchical clustering, including the chaining effect and lack of cluster centers. It then reviews existing algorithms, such as Sibson's single linkage and Defays' complete linkage methods, which require all pairwise dissimilarities. The paper then introduces algorithms that focus on finding nearest neighbors, which are crucial for agglomeration steps. These algorithms are discussed in the context of the nearest neighbor graph (NN-graph), which helps identify reciprocal nearest neighbors (RNNs) and their properties. The reducibility property of RNNs is discussed, which allows for efficient merging of clusters without affecting the RNN properties of other parts of the graph. This property is used in the multiple cluster algorithm (Algorithm 3) for geometric clustering methods, which reduces the set of points by replacing RNNs with cluster centers. However, this algorithm has a worst-case time complexity of O(N³), which is improved by the single cluster algorithm (Algorithm 4), which uses NN-chains to grow clusters efficiently. The paper also discusses the construction of minimal spanning trees (MSTs) and their use in single linkage clustering. It presents algorithms for constructing MSTs using subgraphs and discusses the use of efficient nearest neighbor finding techniques, such as those based on multidimensional binary search trees and branch and bound methods. These techniques help reduce the computational complexity of nearest neighbor searches. The paper concludes by emphasizing the importance of nearest neighbor concepts in hierarchical clustering and the need for further research into efficient algorithms for high-dimensional spaces. It also highlights the versatility of subgraph approaches in incorporating various nearest neighbor finding techniques. Overall, the survey shows that recent advances in nearest neighbor searching have significantly improved the efficiency of hierarchical clustering algorithms.
Reach us at info@study.space
[slides and audio] A Survey of Recent Advances in Hierarchical Clustering Algorithms