Understanding Fast R Functions for Robust Correlations and Hierarchical Clustering.

The article presents efficient R functions for calculating Pearson correlation and robust correlation (biweight midcorrelation) in the context of high-throughput biological data analyses. The standard R function for Pearson correlation is efficient for datasets with no missing values but is slower for datasets with a small number of missing entries. The authors propose an implementation that significantly speeds up the calculation of Pearson correlation, especially for datasets with a few missing entries. This implementation is part of the updated R package WGCNA. Additionally, they parallelize all calculations to further speed up the process on systems with parallel processing capabilities. For hierarchical clustering, the standard R function `hclust` has a complexity of \( n^3 \), where \( n \) is the number of objects being clustered. The authors introduce the `flashClust` package, which implements a more efficient algorithm with a complexity of approximately \( n^2 \), leading to substantial time savings when clustering large datasets. The article also discusses the robustness of the biweight midcorrelation measure and its application in weighted gene co-expression network analysis (WGCNA). The functions `cor` and `bicor` are provided in the WGCNA package to facilitate these calculations. The article includes detailed examples and timing comparisons to demonstrate the performance gains achieved by the new functions.The article presents efficient R functions for calculating Pearson correlation and robust correlation (biweight midcorrelation) in the context of high-throughput biological data analyses. The standard R function for Pearson correlation is efficient for datasets with no missing values but is slower for datasets with a small number of missing entries. The authors propose an implementation that significantly speeds up the calculation of Pearson correlation, especially for datasets with a few missing entries. This implementation is part of the updated R package WGCNA. Additionally, they parallelize all calculations to further speed up the process on systems with parallel processing capabilities. For hierarchical clustering, the standard R function `hclust` has a complexity of \( n^3 \), where \( n \) is the number of objects being clustered. The authors introduce the `flashClust` package, which implements a more efficient algorithm with a complexity of approximately \( n^2 \), leading to substantial time savings when clustering large datasets. The article also discusses the robustness of the biweight midcorrelation measure and its application in weighted gene co-expression network analysis (WGCNA). The functions `cor` and `bicor` are provided in the WGCNA package to facilitate these calculations. The article includes detailed examples and timing comparisons to demonstrate the performance gains achieved by the new functions.

Fast R Functions for Robust Correlations and Hierarchical Clustering

March 2012 | Peter Langfelder, Steve Horvath