Bayesian community-wide culture-independent microbial source tracking

Bayesian community-wide culture-independent microbial source tracking

2013 | Dan Knights, Justin Kuczynski, Emily S. Charlson, Jesse Zaneveld, Michael C. Mozer, Ronald G. Collman, Frederic D. Bushman, Rob Knight, and Scott T. Kelley
SourceTracker is a Bayesian method for estimating the proportion of microbial communities that originate from specific source environments. The method was applied to microbial surveys from neonatal intensive care units (NICUs), offices, and molecular biology laboratories, and a database of known contaminants was provided for future testing. Advances in sequencing and informatics have increased data acquisition and integration, revolutionizing our understanding of microbial roles in health, disease, and biogeochemical cycling. However, sample contamination remains a significant challenge, as even trace contamination can become a serious issue. Computational methods could identify contamination sources and quantities, helping prevent future contamination. SourceTracker models contamination as a mixture of entire source communities into a sink community, estimating the mixing proportions. Unlike previous methods that focused on detecting predetermined indicator species, SourceTracker directly estimates source proportions and models uncertainty about known and unknown sources. The method was tested using barcoded pyrosequencing datasets of bacterial 16S rRNA gene sequences from various environments, including human skin, oral cavities, feces, and temperate soils. SourceTracker was compared to published datasets from likely contaminant sources, and the results showed that SourceTracker outperformed other methods, especially in cases of ambiguous contamination. The method uses Gibbs sampling to explore the joint distribution of sequence-source assignments, allowing for uncertainty in source and sink distributions. SourceTracker was also tested for its ability to detect low-level contamination, with sensitivity adjusted by the prior parameter β. The method was found to be effective in a wide range of sensitivities, demonstrating its applicability beyond low-biomass environments. SourceTracker was applied to estimate the proportion of bacteria from various environments in indoor sink samples, revealing that wet-lab surfaces were mainly composed of bacteria from Skin and Unknown, with exceptions. The method also identified the most common contaminating taxa. SourceTracker can be used for source tracking in various microbial community surveys and shotgun metagenomics. The implementation is available as an R package, and automated tests are recommended for screening contaminated samples before deposition.SourceTracker is a Bayesian method for estimating the proportion of microbial communities that originate from specific source environments. The method was applied to microbial surveys from neonatal intensive care units (NICUs), offices, and molecular biology laboratories, and a database of known contaminants was provided for future testing. Advances in sequencing and informatics have increased data acquisition and integration, revolutionizing our understanding of microbial roles in health, disease, and biogeochemical cycling. However, sample contamination remains a significant challenge, as even trace contamination can become a serious issue. Computational methods could identify contamination sources and quantities, helping prevent future contamination. SourceTracker models contamination as a mixture of entire source communities into a sink community, estimating the mixing proportions. Unlike previous methods that focused on detecting predetermined indicator species, SourceTracker directly estimates source proportions and models uncertainty about known and unknown sources. The method was tested using barcoded pyrosequencing datasets of bacterial 16S rRNA gene sequences from various environments, including human skin, oral cavities, feces, and temperate soils. SourceTracker was compared to published datasets from likely contaminant sources, and the results showed that SourceTracker outperformed other methods, especially in cases of ambiguous contamination. The method uses Gibbs sampling to explore the joint distribution of sequence-source assignments, allowing for uncertainty in source and sink distributions. SourceTracker was also tested for its ability to detect low-level contamination, with sensitivity adjusted by the prior parameter β. The method was found to be effective in a wide range of sensitivities, demonstrating its applicability beyond low-biomass environments. SourceTracker was applied to estimate the proportion of bacteria from various environments in indoor sink samples, revealing that wet-lab surfaces were mainly composed of bacteria from Skin and Unknown, with exceptions. The method also identified the most common contaminating taxa. SourceTracker can be used for source tracking in various microbial community surveys and shotgun metagenomics. The implementation is available as an R package, and automated tests are recommended for screening contaminated samples before deposition.
Reach us at info@study.space
[slides] Bayesian community-wide culture-independent microbial source tracking | StudySpace