ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking

ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking

April 28, 2010 | Matthew D. Wilkerson and D. Neil Hayes
ConsensusClusterPlus is an R-based software tool for unsupervised class discovery in cancer research. It implements the consensus clustering (CC) method, which provides quantitative and visual stability evidence for estimating the number of classes in a dataset. The tool extends CC with new features such as item tracking, item-consensus and cluster-consensus plots, enabling more specific decisions in unsupervised class discovery. The software is open source and available through Bioconductor. ConsensusClusterPlus takes a data matrix and user-specified options as input. The data matrix represents features of a set of samples, such as gene expression data. The output includes stability evidence for a given number of groups (k) and cluster assignments. The output consists of R data objects, text files, graphical plots, and a log file. The algorithm extends the CC method by allowing 2D feature and item subsampling according to specific distributions and the option for a custom clustering algorithm. It calculates pairwise consensus values and stores them in a consensus matrix (CM) for each k. A final agglomerative hierarchical consensus clustering is then performed using 1-consensus values, which are pruned to k groups. The software produces graphical plots that extend CC visualizations. CM plots show consensus values on a white to blue scale, ordered by consensus clustering, and mark items' consensus clusters with colored rectangles. CDF plots display consensus distributions for each k, helping to find the k at which the distribution reaches an approximate maximum, indicating maximum stability. Item tracking plots show consensus cluster assignments for items at each k, allowing users to track cluster assignments and identify promiscuous items. IC plots display items as vertical bars with heights corresponding to IC values, while CLC plots show average pairwise IC values for items in a consensus cluster. ConsensusClusterPlus is a Bioconductor-compatible, open-source software for unsupervised class discovery, extending CC with new, easy-to-use functionality and visualizations that enable detailed analysis.ConsensusClusterPlus is an R-based software tool for unsupervised class discovery in cancer research. It implements the consensus clustering (CC) method, which provides quantitative and visual stability evidence for estimating the number of classes in a dataset. The tool extends CC with new features such as item tracking, item-consensus and cluster-consensus plots, enabling more specific decisions in unsupervised class discovery. The software is open source and available through Bioconductor. ConsensusClusterPlus takes a data matrix and user-specified options as input. The data matrix represents features of a set of samples, such as gene expression data. The output includes stability evidence for a given number of groups (k) and cluster assignments. The output consists of R data objects, text files, graphical plots, and a log file. The algorithm extends the CC method by allowing 2D feature and item subsampling according to specific distributions and the option for a custom clustering algorithm. It calculates pairwise consensus values and stores them in a consensus matrix (CM) for each k. A final agglomerative hierarchical consensus clustering is then performed using 1-consensus values, which are pruned to k groups. The software produces graphical plots that extend CC visualizations. CM plots show consensus values on a white to blue scale, ordered by consensus clustering, and mark items' consensus clusters with colored rectangles. CDF plots display consensus distributions for each k, helping to find the k at which the distribution reaches an approximate maximum, indicating maximum stability. Item tracking plots show consensus cluster assignments for items at each k, allowing users to track cluster assignments and identify promiscuous items. IC plots display items as vertical bars with heights corresponding to IC values, while CLC plots show average pairwise IC values for items in a consensus cluster. ConsensusClusterPlus is a Bioconductor-compatible, open-source software for unsupervised class discovery, extending CC with new, easy-to-use functionality and visualizations that enable detailed analysis.
Reach us at info@study.space
Understanding ConsensusClusterPlus%3A a class discovery tool with confidence assessments and item tracking