Ironing out the wrinkles in the rare biosphere through improved OTU clustering

Ironing out the wrinkles in the rare biosphere through improved OTU clustering

2010 | Susan M. Huse, David Mark Welch, Hilary G. Morrison and Mitchell L. Sogin
This study addresses the issue of overestimating microbial diversity in environmental samples due to the clustering of operational taxonomic units (OTUs). Deep sequencing of PCR amplicon libraries can lead to an overestimation of diversity because of low-frequency, error-prone reads. The authors propose a new clustering method that uses a 2% single-linkage preclustering followed by an average-linkage clustering based on pairwise alignments. This method more accurately predicts the expected number of OTUs in both single and pooled template preparations of known taxonomic composition. The new method can reduce OTU richness in environmental samples by up to 30–60%, but does not reduce the fraction of OTUs in long-tailed rank abundance curves that define the rare biosphere. The study highlights the challenges of estimating microbial diversity through deep sequencing, particularly the accuracy of OTU richness estimates and the extent of the rare biosphere. Sequencing errors and inadequate clustering algorithms can lead to artificially inflated estimates of community richness. The authors evaluated multiple aspects of the clustering process, including the contribution of sequencing error, alignment method, and clustering algorithm to the number of observed OTUs for communities of known taxonomic composition. They set their cluster threshold at 3% to minimize the influence of sequencing errors and to be consistent with previous work. The authors amplified and pyrosequenced the V6 hypervariable region of the ribosomal RNA gene from single- and multiple-template pools to determine a simple and computationally effective method for accurately clustering short hypervariable pyrotags. They then applied this new clustering method to multiple published environmental samples to determine the nature and extent of OTU inflation in diverse samples. The study found that the new clustering method significantly reduced the number of OTUs compared to the common method of multiple sequence alignment and complete-linkage clustering. The new method also reduced the number of spurious OTUs generated by errant sequences. The authors compared the new method to PyroNoise, another method for reducing the contribution of pyrosequencing error to inflated estimates of diversity. Both methods reduced the number of OTUs by approximately 30–50%. The study also examined the impact of the new method on environmental data sets and the rare biosphere. The new method generated 30–40% fewer OTUs than the common method, and the relative frequency of the most abundant OTUs was 10–50% greater using the new method. Neither the fraction of OTUs that contained only a single tag (singleton OTUs) nor the fraction of OTUs that contained 1–3 tags differed substantially between methods. Thus, while the new method reduces the total number of OTUs, it does so across a wide range of OTU abundances and does not reduce the proportion of OTUs that comprise the long tail of the abundance curve. The study concludes that the new clustering method provides a more accurate estimate of microbial diversity by reducing the number ofThis study addresses the issue of overestimating microbial diversity in environmental samples due to the clustering of operational taxonomic units (OTUs). Deep sequencing of PCR amplicon libraries can lead to an overestimation of diversity because of low-frequency, error-prone reads. The authors propose a new clustering method that uses a 2% single-linkage preclustering followed by an average-linkage clustering based on pairwise alignments. This method more accurately predicts the expected number of OTUs in both single and pooled template preparations of known taxonomic composition. The new method can reduce OTU richness in environmental samples by up to 30–60%, but does not reduce the fraction of OTUs in long-tailed rank abundance curves that define the rare biosphere. The study highlights the challenges of estimating microbial diversity through deep sequencing, particularly the accuracy of OTU richness estimates and the extent of the rare biosphere. Sequencing errors and inadequate clustering algorithms can lead to artificially inflated estimates of community richness. The authors evaluated multiple aspects of the clustering process, including the contribution of sequencing error, alignment method, and clustering algorithm to the number of observed OTUs for communities of known taxonomic composition. They set their cluster threshold at 3% to minimize the influence of sequencing errors and to be consistent with previous work. The authors amplified and pyrosequenced the V6 hypervariable region of the ribosomal RNA gene from single- and multiple-template pools to determine a simple and computationally effective method for accurately clustering short hypervariable pyrotags. They then applied this new clustering method to multiple published environmental samples to determine the nature and extent of OTU inflation in diverse samples. The study found that the new clustering method significantly reduced the number of OTUs compared to the common method of multiple sequence alignment and complete-linkage clustering. The new method also reduced the number of spurious OTUs generated by errant sequences. The authors compared the new method to PyroNoise, another method for reducing the contribution of pyrosequencing error to inflated estimates of diversity. Both methods reduced the number of OTUs by approximately 30–50%. The study also examined the impact of the new method on environmental data sets and the rare biosphere. The new method generated 30–40% fewer OTUs than the common method, and the relative frequency of the most abundant OTUs was 10–50% greater using the new method. Neither the fraction of OTUs that contained only a single tag (singleton OTUs) nor the fraction of OTUs that contained 1–3 tags differed substantially between methods. Thus, while the new method reduces the total number of OTUs, it does so across a wide range of OTU abundances and does not reduce the proportion of OTUs that comprise the long tail of the abundance curve. The study concludes that the new clustering method provides a more accurate estimate of microbial diversity by reducing the number of
Reach us at info@study.space
[slides] Ironing out the wrinkles in the rare biosphere through improved OTU clustering | StudySpace