April 2014 | Volume 10 | Issue 4 | e1003531 | Paul J. McMurdie, Susan Holmes*
The article "Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible" by Paul J. McMurdie and Susan Holmes discusses the statistical issues with rarefying microbiome count data. The authors argue that the common practice of rarefying, which involves reducing the number of sequences in larger samples to match smaller ones, is inefficient and inappropriate for detecting differentially abundant species. They propose using a mixture model, such as the Negative Binomial distribution, to account for library size differences and biological variability. This approach, already well-established in RNA-Seq analysis, can improve the power and accuracy of detecting differential abundance. The authors demonstrate through simulations and empirical data that rarefying leads to high rates of false positives and discards valuable data. They recommend avoiding rarefying altogether and suggest using tools like DESeq2 and edgeR, which are designed for RNA-Seq data and can effectively handle microbiome count data. The article provides R packages and code to implement these methods, emphasizing the importance of statistical rigor in microbiome research.The article "Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible" by Paul J. McMurdie and Susan Holmes discusses the statistical issues with rarefying microbiome count data. The authors argue that the common practice of rarefying, which involves reducing the number of sequences in larger samples to match smaller ones, is inefficient and inappropriate for detecting differentially abundant species. They propose using a mixture model, such as the Negative Binomial distribution, to account for library size differences and biological variability. This approach, already well-established in RNA-Seq analysis, can improve the power and accuracy of detecting differential abundance. The authors demonstrate through simulations and empirical data that rarefying leads to high rates of false positives and discards valuable data. They recommend avoiding rarefying altogether and suggest using tools like DESeq2 and edgeR, which are designed for RNA-Seq data and can effectively handle microbiome count data. The article provides R packages and code to implement these methods, emphasizing the importance of statistical rigor in microbiome research.