Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data

Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data

(2018) 6:226 | Nicole M. Davis, Diana M. Proctor, Susan P. Holmes, David A. Relman, Benjamin J. Callahan
The paper introduces decontam, an open-source R package that addresses the issue of DNA contamination in microbial community surveys based on marker-gene and metagenomic sequencing (MGS). Contamination, which can arise from various sources including reagents, can significantly affect the accuracy of microbial community analysis. Decontam implements two statistical classification procedures to identify contaminants: frequency-based and prevalence-based methods. The frequency-based method identifies contaminants by their inverse correlation with sample DNA concentration, while the prevalence-based method identifies them by their higher prevalence in negative controls compared to true samples. The package is validated on several datasets, including a human oral dataset, a dilution series, and placenta biopsies, demonstrating its effectiveness in reducing technical variation and improving the quality of microbial community profiles. Decontam is easy to integrate into existing MGS workflows and can be applied to various sequence features, making it a valuable tool for enhancing the reliability of MGS studies.The paper introduces decontam, an open-source R package that addresses the issue of DNA contamination in microbial community surveys based on marker-gene and metagenomic sequencing (MGS). Contamination, which can arise from various sources including reagents, can significantly affect the accuracy of microbial community analysis. Decontam implements two statistical classification procedures to identify contaminants: frequency-based and prevalence-based methods. The frequency-based method identifies contaminants by their inverse correlation with sample DNA concentration, while the prevalence-based method identifies them by their higher prevalence in negative controls compared to true samples. The package is validated on several datasets, including a human oral dataset, a dilution series, and placenta biopsies, demonstrating its effectiveness in reducing technical variation and improving the quality of microbial community profiles. Decontam is easy to integrate into existing MGS workflows and can be applied to various sequence features, making it a valuable tool for enhancing the reliability of MGS studies.
Reach us at info@study.space
[slides and audio] Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data