Normalization and microbial differential abundance strategies depend upon data characteristics

Normalization and microbial differential abundance strategies depend upon data characteristics

2017 | Sophie Weiss, Zhenjiang Zech Xu, Shyamal Peddana, Amnon Amir, Kyle Bittinger, Antonio Gonzalez, Catherine Lozupone, Jesse R. Zaneveld, Yoshiki Vázquez-Baeza, Amanda Birmingham, Embriette R. Hyde, Rob Knight
This study evaluates the impact of data characteristics on normalization and differential abundance methods in microbiome data. The research highlights that normalization and differential abundance techniques must account for the compositional nature of microbiome data, which is constrained by the simplex (sum to 1) and not free in Euclidean space. Normalization methods, such as rarefying, help standardize library sizes across samples, but can introduce biases due to sequence depth variation. Alternative normalization methods, like scaling, may overestimate or underestimate zero fractions, leading to distorted OTU correlations. Aitchison's log-ratio transformation is suitable for compositional data but is limited by the presence of zeros, which can be addressed with pseudocounts. Differential abundance testing methods, such as ANCOM, are effective for detecting differentially abundant taxa, especially when sample sizes are large. However, nonparametric methods like the Mann-Whitney test may not account for compositional effects, leading to inflated false discovery rates. Parametric models, such as DESeq2 and edgeR, can provide higher sensitivity but may have higher false discovery rates with large or uneven library sizes. The study shows that rarefying can reduce false discovery rates for groups with large library size differences but may lower sensitivity by removing data. ANCOM is the only method tested that maintains good control of false discovery rates for ecosystem-level taxon abundance inferences. The study concludes that normalization and differential abundance techniques should be selected based on data characteristics. Rarefying remains a useful technique for sample normalization, especially for presence/absence distance metrics. However, for weighted distance measures and when sequencing depth is not a confounding variable, other methods show promise. The study emphasizes the need for better solutions to the zero problem and more research on the effects of compositional data on beta-diversity analysis. Overall, the findings guide the choice of normalization and differential abundance techniques based on the specific characteristics of the study data.This study evaluates the impact of data characteristics on normalization and differential abundance methods in microbiome data. The research highlights that normalization and differential abundance techniques must account for the compositional nature of microbiome data, which is constrained by the simplex (sum to 1) and not free in Euclidean space. Normalization methods, such as rarefying, help standardize library sizes across samples, but can introduce biases due to sequence depth variation. Alternative normalization methods, like scaling, may overestimate or underestimate zero fractions, leading to distorted OTU correlations. Aitchison's log-ratio transformation is suitable for compositional data but is limited by the presence of zeros, which can be addressed with pseudocounts. Differential abundance testing methods, such as ANCOM, are effective for detecting differentially abundant taxa, especially when sample sizes are large. However, nonparametric methods like the Mann-Whitney test may not account for compositional effects, leading to inflated false discovery rates. Parametric models, such as DESeq2 and edgeR, can provide higher sensitivity but may have higher false discovery rates with large or uneven library sizes. The study shows that rarefying can reduce false discovery rates for groups with large library size differences but may lower sensitivity by removing data. ANCOM is the only method tested that maintains good control of false discovery rates for ecosystem-level taxon abundance inferences. The study concludes that normalization and differential abundance techniques should be selected based on data characteristics. Rarefying remains a useful technique for sample normalization, especially for presence/absence distance metrics. However, for weighted distance measures and when sequencing depth is not a confounding variable, other methods show promise. The study emphasizes the need for better solutions to the zero problem and more research on the effects of compositional data on beta-diversity analysis. Overall, the findings guide the choice of normalization and differential abundance techniques based on the specific characteristics of the study data.
Reach us at info@study.space