STAMP: statistical analysis of taxonomic and functional profiles

STAMP: statistical analysis of taxonomic and functional profiles

July 23, 2014 | Donovan H. Parks¹, Gene W. Tyson¹,², Philip Hugenholtz¹,³ and Robert G. Beiko⁴
stamp is a graphical software package that provides statistical hypothesis tests and exploratory plots for analyzing taxonomic and functional profiles. it supports tests for comparing pairs of samples or samples organized into two or more treatment groups. effect sizes and confidence intervals are provided to allow critical assessment of the biological relevancy of test results. a user-friendly graphical interface permits easy exploration of statistical results and generation of publication-quality plots. stamp is licensed under the gnu gpl. python source code and binaries are available from the website http://kiwi.cs.dal.ca/software/stamp. the original release of stamp was limited to comparing a single pair of taxonomic or functional profiles. this release adds statistical tests and plots for assessing differences between two or more treatment groups along with increased compatibility with popular bioinformatic software. stamp can process functional and taxonomic profiles produced by qiime, picrust, mg-rast, img/m, and rita. custom profiles can also be specified as a tab-separated values file. stamp can process input files containing hundreds of samples spanning thousands of features with a standard desktop computer. statistical hypothesis tests include welch's t-test and white's nonparametric t-test for comparing profiles organized into two groups. stamp implements anova and kruskal-wallis h-test for comparing three or more groups of profiles. statistically significant features can be further examined with post hoc tests to determine which groups of profiles differ from each other. effect size and confidence intervals are provided for all statistical tests to aid in determining features with biologically relevant differences between groups. features can be filtered based on their p-value, effect size or prevalence within a group of profiles to create plots focused on features likely to be biologically relevant. numerous publication-quality plots can be produced using stamp, including principal component analysis (pca) plots, bar plots, box-and-whisker plots, scatter plots and heat maps. extended error bar plots provide a single figure indicating statistically significant features along with p-values, effect sizes and confidence intervals. in the cbm communities section, the metabolic activity of microbial communities has been implicated as a major source of methane in many cbm reservoirs. stamp was used to examine the taxonomic profiles of 44 cbm communities sampled from drilled cores, shallow and deep core cuttings, and produced waters. a pca plot indicates that communities from shallow core cuttings are relatively distinct. specific families were found to be overrepresented in these communities. a pca plot coloured by the company performing the drilling reveals secondary clustering of shallow core cuttings indicating that the difference between cbm samples may be the result of secondary factors such as collection protocols or geography as opposed to different niches within the cbm environment. in the melainabacteria genomes section, the melainabacteria are a recently discovered and highly diverse group of bacteria that form a sister class or phylum to the cyanobacteria. stamp was used to compare cog profiles of the non-photosynthetic melainabacteria with thestamp is a graphical software package that provides statistical hypothesis tests and exploratory plots for analyzing taxonomic and functional profiles. it supports tests for comparing pairs of samples or samples organized into two or more treatment groups. effect sizes and confidence intervals are provided to allow critical assessment of the biological relevancy of test results. a user-friendly graphical interface permits easy exploration of statistical results and generation of publication-quality plots. stamp is licensed under the gnu gpl. python source code and binaries are available from the website http://kiwi.cs.dal.ca/software/stamp. the original release of stamp was limited to comparing a single pair of taxonomic or functional profiles. this release adds statistical tests and plots for assessing differences between two or more treatment groups along with increased compatibility with popular bioinformatic software. stamp can process functional and taxonomic profiles produced by qiime, picrust, mg-rast, img/m, and rita. custom profiles can also be specified as a tab-separated values file. stamp can process input files containing hundreds of samples spanning thousands of features with a standard desktop computer. statistical hypothesis tests include welch's t-test and white's nonparametric t-test for comparing profiles organized into two groups. stamp implements anova and kruskal-wallis h-test for comparing three or more groups of profiles. statistically significant features can be further examined with post hoc tests to determine which groups of profiles differ from each other. effect size and confidence intervals are provided for all statistical tests to aid in determining features with biologically relevant differences between groups. features can be filtered based on their p-value, effect size or prevalence within a group of profiles to create plots focused on features likely to be biologically relevant. numerous publication-quality plots can be produced using stamp, including principal component analysis (pca) plots, bar plots, box-and-whisker plots, scatter plots and heat maps. extended error bar plots provide a single figure indicating statistically significant features along with p-values, effect sizes and confidence intervals. in the cbm communities section, the metabolic activity of microbial communities has been implicated as a major source of methane in many cbm reservoirs. stamp was used to examine the taxonomic profiles of 44 cbm communities sampled from drilled cores, shallow and deep core cuttings, and produced waters. a pca plot indicates that communities from shallow core cuttings are relatively distinct. specific families were found to be overrepresented in these communities. a pca plot coloured by the company performing the drilling reveals secondary clustering of shallow core cuttings indicating that the difference between cbm samples may be the result of secondary factors such as collection protocols or geography as opposed to different niches within the cbm environment. in the melainabacteria genomes section, the melainabacteria are a recently discovered and highly diverse group of bacteria that form a sister class or phylum to the cyanobacteria. stamp was used to compare cog profiles of the non-photosynthetic melainabacteria with the
Reach us at info@study.space