[slides] Ward%E2%80%99s Hierarchical Agglomerative Clustering Method%3A Which Algorithms Implement Ward%E2%80%99s Criterion%3F

Ward's hierarchical clustering method, widely used since its introduction in 1963, has been implemented in various software systems with differing interpretations of the agglomerative criterion. This paper clarifies the distinctions between different implementations, particularly between Ward1 and Ward2, and highlights the importance of input formatting and algorithmic choices. The Ward method minimizes the increase in within-cluster variance, and its implementation depends on whether input distances are squared or not. The Lance-Williams formula is used to update dissimilarities during agglomeration. The paper discusses two main implementations: Ward1 and Ward2. Ward1 uses squared Euclidean distances and minimizes the variance of the new cluster, while Ward2 uses Euclidean distances and minimizes the increase in total within-cluster sum of squared error. Despite these differences, the two implementations can produce identical results under specific conditions, such as when inputs are squared for Ward1 or unsquared for Ward2. The paper also presents case studies comparing different implementations, including R functions hclust, agnes, and hclust.PL. These studies show that the outputs of these functions can be identical under certain conditions, such as when inputs are squared or when node heights are square-rooted. The results demonstrate that the choice of input format and algorithmic parameters significantly affects the output of hierarchical clustering. The paper concludes that while Ward1 and Ward2 can produce identical results under specific conditions, they differ in their interpretation of the clustering criterion. The Ward2 implementation is more suitable for comparing ultrametric distances with input distances, while Ward1 requires taking the square root of node heights for such comparisons. The paper emphasizes the importance of understanding these differences to ensure accurate and consistent results in data analysis.Ward's hierarchical clustering method, widely used since its introduction in 1963, has been implemented in various software systems with differing interpretations of the agglomerative criterion. This paper clarifies the distinctions between different implementations, particularly between Ward1 and Ward2, and highlights the importance of input formatting and algorithmic choices. The Ward method minimizes the increase in within-cluster variance, and its implementation depends on whether input distances are squared or not. The Lance-Williams formula is used to update dissimilarities during agglomeration. The paper discusses two main implementations: Ward1 and Ward2. Ward1 uses squared Euclidean distances and minimizes the variance of the new cluster, while Ward2 uses Euclidean distances and minimizes the increase in total within-cluster sum of squared error. Despite these differences, the two implementations can produce identical results under specific conditions, such as when inputs are squared for Ward1 or unsquared for Ward2. The paper also presents case studies comparing different implementations, including R functions hclust, agnes, and hclust.PL. These studies show that the outputs of these functions can be identical under certain conditions, such as when inputs are squared or when node heights are square-rooted. The results demonstrate that the choice of input format and algorithmic parameters significantly affects the output of hierarchical clustering. The paper concludes that while Ward1 and Ward2 can produce identical results under specific conditions, they differ in their interpretation of the clustering criterion. The Ward2 implementation is more suitable for comparing ultrametric distances with input distances, while Ward1 requires taking the square root of node heights for such comparisons. The paper emphasizes the importance of understanding these differences to ensure accurate and consistent results in data analysis.

Ward’s Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm

December 13, 2011 | Fionn Murtagh (1) and Pierre Legendre (2)