[slides and audio] A Survey of Recent Advances in Hierarchical Clustering Algorithms

This paper by F. Murtagh from the Department of Computer Science at University College Dublin provides an overview of recent advancements in hierarchical clustering algorithms. It challenges the common belief that hierarchical clustering algorithms have a complexity of at least \(O(N^2)\) due to their requirement for pairwise interobject proximities. The author discusses a general framework for hierarchical, agglomerative clustering algorithms, highlighting potential improvements over current widely-used methods. The paper begins with an introduction to hierarchical clustering, emphasizing its flexibility and non-parametric nature. It reviews the progress in algorithms for minimal spanning trees and single linkage methods, noting their practical disadvantages such as the 'chaining' effect and lack of a clear cluster center. Current efficient hierarchical clustering algorithms, such as those by Sibson and Defays, are discussed, but their limitations in terms of computational complexity are noted. The main focus of the paper is on algorithms that require only the nearest neighbor points for each object at any stage of clustering. These algorithms are presented at a high level, with detailed implementation considerations deferred to later sections. The complexity aspects of these algorithms are discussed, assuming brute-force nearest neighbor searching initially. The paper then delves into specific algorithms, including a general hierarchical clustering algorithm using a dissimilarity-update formula, and algorithms that use original data and cluster data. These algorithms are designed to reduce storage requirements and computational complexity by focusing on the nearest neighbor problem. Fast algorithms for geometric clustering methods, such as the multiple cluster algorithm and the single cluster algorithm, are presented. These algorithms leverage the concept of reciprocal nearest neighbors (RNNs) and the reducibility property to improve efficiency. The paper also discusses techniques for finding nearest neighbors efficiently, including hashing, multidimensional binary search trees, and branch and bound approaches. Finally, the paper concludes with a discussion on open problems and future directions, emphasizing the need for further research in nearest neighbor finding techniques, particularly in high-dimensional spaces. The author highlights the importance of concepts like mutual nearest neighbors, NN-chains, and NN-graphs in the development of efficient hierarchical clustering algorithms.This paper by F. Murtagh from the Department of Computer Science at University College Dublin provides an overview of recent advancements in hierarchical clustering algorithms. It challenges the common belief that hierarchical clustering algorithms have a complexity of at least \(O(N^2)\) due to their requirement for pairwise interobject proximities. The author discusses a general framework for hierarchical, agglomerative clustering algorithms, highlighting potential improvements over current widely-used methods. The paper begins with an introduction to hierarchical clustering, emphasizing its flexibility and non-parametric nature. It reviews the progress in algorithms for minimal spanning trees and single linkage methods, noting their practical disadvantages such as the 'chaining' effect and lack of a clear cluster center. Current efficient hierarchical clustering algorithms, such as those by Sibson and Defays, are discussed, but their limitations in terms of computational complexity are noted. The main focus of the paper is on algorithms that require only the nearest neighbor points for each object at any stage of clustering. These algorithms are presented at a high level, with detailed implementation considerations deferred to later sections. The complexity aspects of these algorithms are discussed, assuming brute-force nearest neighbor searching initially. The paper then delves into specific algorithms, including a general hierarchical clustering algorithm using a dissimilarity-update formula, and algorithms that use original data and cluster data. These algorithms are designed to reduce storage requirements and computational complexity by focusing on the nearest neighbor problem. Fast algorithms for geometric clustering methods, such as the multiple cluster algorithm and the single cluster algorithm, are presented. These algorithms leverage the concept of reciprocal nearest neighbors (RNNs) and the reducibility property to improve efficiency. The paper also discusses techniques for finding nearest neighbors efficiently, including hashing, multidimensional binary search trees, and branch and bound approaches. Finally, the paper concludes with a discussion on open problems and future directions, emphasizing the need for further research in nearest neighbor finding techniques, particularly in high-dimensional spaces. The author highlights the importance of concepts like mutual nearest neighbors, NN-chains, and NN-graphs in the development of efficient hierarchical clustering algorithms.

A Survey of Recent Advances in Hierarchical Clustering Algorithms

1983 | F. Murtagh