Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible

Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible

April 2014 | Paul J. McMurtrie, Susan Holmes
The article argues against the use of rarefying microbiome data, which is a common practice in microbiome research. Rarefying involves reducing the number of sequences in each sample to a common size, which can lead to loss of data and increased uncertainty. The authors propose that instead of rarefying, researchers should use statistical models that account for library size differences and biological variability, such as mixture models based on the Negative Binomial distribution. These models are already used in RNA-Seq analysis and can be adapted for microbiome data. The authors demonstrate that rarefying leads to a high rate of false positives and can discard samples that could be accurately clustered by other methods. They also compare different Negative Binomial methods with a zero-inflated Gaussian mixture model and find that the latter performs well when there are sufficient biological replicates. However, it still tends toward a higher false positive rate. The authors recommend that investigators avoid rarefying altogether and use appropriate statistical methods for normalization. They provide microbiome-specific extensions to these tools in the R package phyloseq. The article emphasizes the importance of using variance-stabilizing transformations and hierarchical mixture models to account for overdispersion and heteroscedasticity in microbiome data. The authors conclude that rarefying is statistically inadmissible and that alternative methods should be used for normalization and analysis of microbiome data.The article argues against the use of rarefying microbiome data, which is a common practice in microbiome research. Rarefying involves reducing the number of sequences in each sample to a common size, which can lead to loss of data and increased uncertainty. The authors propose that instead of rarefying, researchers should use statistical models that account for library size differences and biological variability, such as mixture models based on the Negative Binomial distribution. These models are already used in RNA-Seq analysis and can be adapted for microbiome data. The authors demonstrate that rarefying leads to a high rate of false positives and can discard samples that could be accurately clustered by other methods. They also compare different Negative Binomial methods with a zero-inflated Gaussian mixture model and find that the latter performs well when there are sufficient biological replicates. However, it still tends toward a higher false positive rate. The authors recommend that investigators avoid rarefying altogether and use appropriate statistical methods for normalization. They provide microbiome-specific extensions to these tools in the R package phyloseq. The article emphasizes the importance of using variance-stabilizing transformations and hierarchical mixture models to account for overdispersion and heteroscedasticity in microbiome data. The authors conclude that rarefying is statistically inadmissible and that alternative methods should be used for normalization and analysis of microbiome data.
Reach us at info@futurestudyspace.com