December 13, 2011 | Fionn Murtagh (1) and Pierre Legendre (2)
The article by Fionn Murtagh and Pierre Legendre discusses the Ward hierarchical clustering method, which has been widely used since its introduction by Ward in 1963. The authors highlight the different interpretations and implementations of the Ward agglomerative algorithm in various software systems, particularly in R. They emphasize the importance of understanding the input dissimilarities, the loop structure of the algorithm, and the output dendrogram node heights. The article provides a detailed comparison between two implementations of the Ward method: Ward1 and Ward2. Ward1 uses squared Euclidean distances, while Ward2 uses the square root of squared distances. Despite their differences, both implementations can produce the same clustering topology when applied to the same dissimilarity matrix. The authors also discuss the implications of these differences for software developers and users, emphasizing the need for clear documentation and explanations. The article concludes by recommending that software developers offering only the Ward1 algorithm should clearly explain how to obtain the Ward2 output.The article by Fionn Murtagh and Pierre Legendre discusses the Ward hierarchical clustering method, which has been widely used since its introduction by Ward in 1963. The authors highlight the different interpretations and implementations of the Ward agglomerative algorithm in various software systems, particularly in R. They emphasize the importance of understanding the input dissimilarities, the loop structure of the algorithm, and the output dendrogram node heights. The article provides a detailed comparison between two implementations of the Ward method: Ward1 and Ward2. Ward1 uses squared Euclidean distances, while Ward2 uses the square root of squared distances. Despite their differences, both implementations can produce the same clustering topology when applied to the same dissimilarity matrix. The authors also discuss the implications of these differences for software developers and users, emphasizing the need for clear documentation and explanations. The article concludes by recommending that software developers offering only the Ward1 algorithm should clearly explain how to obtain the Ward2 output.