Robust methods for differential abundance analysis of marker gene survey data

Robust methods for differential abundance analysis of marker gene survey data

September 4, 2013 | Joseph Nathaniel Paulson, O. Colin Stine, Héctor Corrada Bravo, & Mihai Pop
This supplementary note discusses a simulation study and a detailed comparison of differential abundance methods on oral microbiome data from the Human Metagenomics Project, along with a discussion on rarefaction and ambiguous read assignment to OTUs. The paper introduces a zero-inflated Gaussian (ZIG) mixture model to account for zero-inflated count data in marker gene surveys. The model incorporates a binomial process to model the probability of zero counts based on total counts and uses an expectation-maximization algorithm to estimate parameters. The ZIG model is applied to estimate fold-changes and test for differential abundance between groups, incorporating a moderated t-statistic based on empirical Bayes methods. The model is compared to other differential abundance detection methods, including Metastats, Lefse, DESeq, and edgeR, showing that ZIG provides more accurate fold-change estimates and better handles sparse features. The paper also discusses the impact of ambiguous read assignment on differential abundance analysis and the importance of normalization in reducing bias. The results show that ZIG outperforms other methods in detecting differentially abundant OTUs, particularly in sparse features. The paper concludes that the ZIG model is a robust method for differential abundance analysis in marker gene surveys, accounting for zero-inflated data and confounding factors.This supplementary note discusses a simulation study and a detailed comparison of differential abundance methods on oral microbiome data from the Human Metagenomics Project, along with a discussion on rarefaction and ambiguous read assignment to OTUs. The paper introduces a zero-inflated Gaussian (ZIG) mixture model to account for zero-inflated count data in marker gene surveys. The model incorporates a binomial process to model the probability of zero counts based on total counts and uses an expectation-maximization algorithm to estimate parameters. The ZIG model is applied to estimate fold-changes and test for differential abundance between groups, incorporating a moderated t-statistic based on empirical Bayes methods. The model is compared to other differential abundance detection methods, including Metastats, Lefse, DESeq, and edgeR, showing that ZIG provides more accurate fold-change estimates and better handles sparse features. The paper also discusses the impact of ambiguous read assignment on differential abundance analysis and the importance of normalization in reducing bias. The results show that ZIG outperforms other methods in detecting differentially abundant OTUs, particularly in sparse features. The paper concludes that the ZIG model is a robust method for differential abundance analysis in marker gene surveys, accounting for zero-inflated data and confounding factors.
Reach us at info@study.space
Understanding Robust methods for differential abundance analysis in marker gene surveys