Pvclust: an R package for assessing the uncertainty in hierarchical clustering

Pvclust: an R package for assessing the uncertainty in hierarchical clustering

April 4, 2006 | Ryota Suzuki and Hidetoshi Shimodaira
Pvclust is an R package for assessing the uncertainty in hierarchical clustering. It performs bootstrap analysis to calculate p-values for clusters, with two types: approximately unbiased (AU) and bootstrap probability (BP). AU p-values use multiscale bootstrap resampling, which is less biased than BP values from ordinary bootstrap resampling. Pvclust also supports parallel computing to reduce computation time. The package is freely available under GPL and can be installed from CRAN. It is designed for general hierarchical clustering problems, allowing users to obtain bootstrap-based p-values for their datasets. Pvclust is particularly useful in phylogenetic analysis, where bootstrap samples are used to assess the reliability of clustering results. In the multiscale bootstrap resampling, data sizes are altered to several values to calculate AU p-values. The AU p-value is computed using a theoretical curve fitted to observed values. The AU p-value is less biased than the BP value, as proven by asymptotic theory. Currently, Pvclust implements only the simplest form of bootstrapping, the non-parametric bootstrap. More complex models for specific applications, such as DNA microarray analysis, are planned for future development. Pvclust is used to analyze DNA microarray data, such as the data of Garber et al. (2001), to assess the reliability of clustering results. The package provides graphical and text-based interfaces for examining standard errors and p-values. The results are visualized in plots, with AU and BP values indicating the reliability of clusters. The package is recommended for use with nboot = 1000 for initial testing, followed by nboot = 10000 for smaller errors. Standard errors of p-values are helpful in determining an appropriate size for B (nboot).Pvclust is an R package for assessing the uncertainty in hierarchical clustering. It performs bootstrap analysis to calculate p-values for clusters, with two types: approximately unbiased (AU) and bootstrap probability (BP). AU p-values use multiscale bootstrap resampling, which is less biased than BP values from ordinary bootstrap resampling. Pvclust also supports parallel computing to reduce computation time. The package is freely available under GPL and can be installed from CRAN. It is designed for general hierarchical clustering problems, allowing users to obtain bootstrap-based p-values for their datasets. Pvclust is particularly useful in phylogenetic analysis, where bootstrap samples are used to assess the reliability of clustering results. In the multiscale bootstrap resampling, data sizes are altered to several values to calculate AU p-values. The AU p-value is computed using a theoretical curve fitted to observed values. The AU p-value is less biased than the BP value, as proven by asymptotic theory. Currently, Pvclust implements only the simplest form of bootstrapping, the non-parametric bootstrap. More complex models for specific applications, such as DNA microarray analysis, are planned for future development. Pvclust is used to analyze DNA microarray data, such as the data of Garber et al. (2001), to assess the reliability of clustering results. The package provides graphical and text-based interfaces for examining standard errors and p-values. The results are visualized in plots, with AU and BP values indicating the reliability of clusters. The package is recommended for use with nboot = 1000 for initial testing, followed by nboot = 10000 for smaller errors. Standard errors of p-values are helpful in determining an appropriate size for B (nboot).
Reach us at info@study.space
[slides and audio] Pvclust%3A an R package for assessing the uncertainty in hierarchical clustering