29 May 2015 | Siddhartha Mandal, Will Van Treuren, Richard A. White, Merete Eggesbo, Rob Knight, Shyamal D. Peddada
The paper introduces a novel statistical framework called Analysis of Composition of Microbiomes (ANCOM) to address the challenges in studying microbial composition. ANCOM accounts for the compositional constraints in microbiome data, which are typically represented in a simplex rather than Euclidean space, and aims to reduce false discoveries while maintaining high statistical power. The method does not rely on distributional assumptions and can be implemented in a linear model framework to adjust for covariates and model longitudinal data. Extensive simulations show that ANCOM outperforms the standard t-test and the Zero Inflated Gaussian (ZIG) methodology in controlling the false discovery rate (FDR) and increasing power. The authors demonstrate the effectiveness of ANCOM using two publicly available datasets of human gut microbiota, highlighting its ability to detect compositional differences and improve biological insights. The method is also shown to be computationally efficient, making it suitable for large-scale data sets.The paper introduces a novel statistical framework called Analysis of Composition of Microbiomes (ANCOM) to address the challenges in studying microbial composition. ANCOM accounts for the compositional constraints in microbiome data, which are typically represented in a simplex rather than Euclidean space, and aims to reduce false discoveries while maintaining high statistical power. The method does not rely on distributional assumptions and can be implemented in a linear model framework to adjust for covariates and model longitudinal data. Extensive simulations show that ANCOM outperforms the standard t-test and the Zero Inflated Gaussian (ZIG) methodology in controlling the false discovery rate (FDR) and increasing power. The authors demonstrate the effectiveness of ANCOM using two publicly available datasets of human gut microbiota, highlighting its ability to detect compositional differences and improve biological insights. The method is also shown to be computationally efficient, making it suitable for large-scale data sets.