November 16, 2021 | Himel Mallick, Ali Rahnavard, Lauren J. McIver, Siyuan Ma, Yancong Zhang, Long H. Nguyen, Timothy L. Tickle, George Weingart, Boyu Ren, Emma H. Schwager, Suvo Chatterjee, Kelsey N. Thompson, Jeremy E. Wilkinson, Ayshwarya Subramanian, Yiren Lu, Levi Waldron, Joseph N. Paulson, Eric A. Franzosa, Hector Corrada Bravo, Curtis Huttenhower
This study introduces MaAsLin 2, a statistical method for identifying multivariable associations between microbial community features and complex metadata in population-scale observational studies. MaAsLin 2 uses generalized linear and mixed models to accommodate a wide range of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as various data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. The method was evaluated through large-scale simulations under various scenarios, revealing that MaAsLin 2 preserves statistical power in the presence of repeated measures and multiple covariates while accounting for the nuances of meta-omics features and controlling false discovery. The method was also applied to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project, revealing a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles. The software packages used in this work are free and open source, including bioBakery methods available via http://huttenhower.sph.harvard.edu/biobakery as source code, cloud-compatible images, and installable packages. Analysis scripts using these packages to generate figures and results from this manuscript (and associated usage notes) are available from https://github.com/biobakery/maaslin2_benchmark. The iHMP dataset is publicly available at the IBDMDB website (https://ibdmdb.org) and the HMP DACC web portal (https://www.hmpdacc.org/ihmp/). The processed HMP2 datasets analysed in this manuscript are also available as Supporting Information. Funding for this work was provided by several US National Science Foundation and National Institutes of Health grants. The authors have declared competing interests, with one author unable to confirm their authorship contributions. The study demonstrates that MaAsLin 2 is a robust and flexible method for identifying multivariable associations in population-scale microbiome studies, with the ability to control false discovery rates and maintain statistical power in the presence of repeated measures and multiple covariates. The method was validated through extensive simulations and an application to HMP2 IBD multi-omics data, and is available as an R/Bioconductor package at https://huttenhower.sph.harvard.edu/maaslin2. The study highlights the importance of controlling false discovery rates in microbiome research and the need for robust statistical methods to accurately identify associations between microbial community features and complex metadata in population-scale studies.This study introduces MaAsLin 2, a statistical method for identifying multivariable associations between microbial community features and complex metadata in population-scale observational studies. MaAsLin 2 uses generalized linear and mixed models to accommodate a wide range of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as various data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. The method was evaluated through large-scale simulations under various scenarios, revealing that MaAsLin 2 preserves statistical power in the presence of repeated measures and multiple covariates while accounting for the nuances of meta-omics features and controlling false discovery. The method was also applied to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project, revealing a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles. The software packages used in this work are free and open source, including bioBakery methods available via http://huttenhower.sph.harvard.edu/biobakery as source code, cloud-compatible images, and installable packages. Analysis scripts using these packages to generate figures and results from this manuscript (and associated usage notes) are available from https://github.com/biobakery/maaslin2_benchmark. The iHMP dataset is publicly available at the IBDMDB website (https://ibdmdb.org) and the HMP DACC web portal (https://www.hmpdacc.org/ihmp/). The processed HMP2 datasets analysed in this manuscript are also available as Supporting Information. Funding for this work was provided by several US National Science Foundation and National Institutes of Health grants. The authors have declared competing interests, with one author unable to confirm their authorship contributions. The study demonstrates that MaAsLin 2 is a robust and flexible method for identifying multivariable associations in population-scale microbiome studies, with the ability to control false discovery rates and maintain statistical power in the presence of repeated measures and multiple covariates. The method was validated through extensive simulations and an application to HMP2 IBD multi-omics data, and is available as an R/Bioconductor package at https://huttenhower.sph.harvard.edu/maaslin2. The study highlights the importance of controlling false discovery rates in microbiome research and the need for robust statistical methods to accurately identify associations between microbial community features and complex metadata in population-scale studies.