Normalization and microbial differential abundance strategies depend upon data characteristics

Normalization and microbial differential abundance strategies depend upon data characteristics

2017 | Sophie Weiss, Zhenjiang Zech Xu, Shyamal Peddada, Amnon Amir, Kyle Bittinger, Antonio Gonzalez, Catherine Lozupone, Jesse R. Zaneveld, Yoshiki Vázquez-Baeza, Amanda Birmingham, Embriette R. Hyde and Rob Knight
The paper evaluates the impact of data characteristics on the performance of normalization and differential abundance analysis methods in microbial ecology studies. It highlights the challenges posed by varying library sizes, the presence of many zeros, and the compositional nature of microbial data. The study compares seven normalization methods, including rarefying, scaling, and log-ratio transformations, and seven differential abundance testing methods. Key findings include: 1. **Normalization Efficacy**: - Rarefying is effective in clustering samples according to biological origin, especially for ordination metrics based on presence or absence. - Other normalization methods may be vulnerable to artifacts due to library size variations. - Rarefying can improve clustering accuracy, particularly for small and uneven library sizes. 2. **Differential Abundance Testing**: - Rarefying does not increase false discovery rates but reduces sensitivity due to data loss. - Parametric models like DESeq2 and edgeR perform well on smaller datasets but tend to have higher false discovery rates with larger or uneven library sizes. - ANCOM maintains a low false discovery rate and is suitable for drawing inferences about taxon abundance in the ecosystem. 3. **Conclusion**: - The choice of normalization and differential abundance techniques should be guided by the specific characteristics of the data. - Rarefying remains a useful technique for sample normalization, especially for presence/absence metrics. - ANCOM is recommended for differential abundance testing, while DESeq2 and edgeR are suitable for smaller datasets. The study provides a comprehensive evaluation of various methods and their performance, offering practical guidance for researchers in microbial ecology studies.The paper evaluates the impact of data characteristics on the performance of normalization and differential abundance analysis methods in microbial ecology studies. It highlights the challenges posed by varying library sizes, the presence of many zeros, and the compositional nature of microbial data. The study compares seven normalization methods, including rarefying, scaling, and log-ratio transformations, and seven differential abundance testing methods. Key findings include: 1. **Normalization Efficacy**: - Rarefying is effective in clustering samples according to biological origin, especially for ordination metrics based on presence or absence. - Other normalization methods may be vulnerable to artifacts due to library size variations. - Rarefying can improve clustering accuracy, particularly for small and uneven library sizes. 2. **Differential Abundance Testing**: - Rarefying does not increase false discovery rates but reduces sensitivity due to data loss. - Parametric models like DESeq2 and edgeR perform well on smaller datasets but tend to have higher false discovery rates with larger or uneven library sizes. - ANCOM maintains a low false discovery rate and is suitable for drawing inferences about taxon abundance in the ecosystem. 3. **Conclusion**: - The choice of normalization and differential abundance techniques should be guided by the specific characteristics of the data. - Rarefying remains a useful technique for sample normalization, especially for presence/absence metrics. - ANCOM is recommended for differential abundance testing, while DESeq2 and edgeR are suitable for smaller datasets. The study provides a comprehensive evaluation of various methods and their performance, offering practical guidance for researchers in microbial ecology studies.
Reach us at info@study.space