2018 | Nicole M. Davis, Diana M. Proctor, Susan P. Holmes, David A. Relman, Benjamin J. Callahan
Decontam is an open-source R package that identifies and removes contaminant sequences in marker-gene and metagenomics (MGS) data. It uses two statistical patterns: contaminant sequences appear at higher frequencies in low-concentration samples and are more prevalent in negative controls. Decontam improves the accuracy of microbial community profiles by removing contaminant DNA sequences, which can distort results in studies of microbial communities. It integrates easily into existing MGS workflows and requires minimal additional cost. Decontam was validated on multiple datasets, including oral microbiome data and placenta biopsy data, where it successfully identified and removed contaminant sequences. In a study of placenta samples, decontam confirmed that the data did not support the existence of a placenta microbiome. In another study, decontam identified run-specific contaminants in preterm birth research, improving the accuracy of biological inferences. Decontam uses frequency-based and prevalence-based methods to classify sequences as contaminants or non-contaminants. The frequency method relies on DNA concentration data, while the prevalence method uses negative control data. Decontam can be applied to various sequence features, including amplicon sequence variants (ASVs), operational taxonomic units (OTUs), and metagenome-assembled genomes (MAGs). It is recommended to use decontam in studies where contamination may affect results, especially in low-biomass environments. Decontam is flexible and can be used with different types of MGS data, including marker-gene and metagenomics data. It is a valuable tool for improving the accuracy of microbial community studies by removing contaminant sequences.Decontam is an open-source R package that identifies and removes contaminant sequences in marker-gene and metagenomics (MGS) data. It uses two statistical patterns: contaminant sequences appear at higher frequencies in low-concentration samples and are more prevalent in negative controls. Decontam improves the accuracy of microbial community profiles by removing contaminant DNA sequences, which can distort results in studies of microbial communities. It integrates easily into existing MGS workflows and requires minimal additional cost. Decontam was validated on multiple datasets, including oral microbiome data and placenta biopsy data, where it successfully identified and removed contaminant sequences. In a study of placenta samples, decontam confirmed that the data did not support the existence of a placenta microbiome. In another study, decontam identified run-specific contaminants in preterm birth research, improving the accuracy of biological inferences. Decontam uses frequency-based and prevalence-based methods to classify sequences as contaminants or non-contaminants. The frequency method relies on DNA concentration data, while the prevalence method uses negative control data. Decontam can be applied to various sequence features, including amplicon sequence variants (ASVs), operational taxonomic units (OTUs), and metagenome-assembled genomes (MAGs). It is recommended to use decontam in studies where contamination may affect results, especially in low-biomass environments. Decontam is flexible and can be used with different types of MGS data, including marker-gene and metagenomics data. It is a valuable tool for improving the accuracy of microbial community studies by removing contaminant sequences.