Accuracy and quality of massively parallel DNA pyrosequencing

Accuracy and quality of massively parallel DNA pyrosequencing

20 July 2007 | Susan M Huse, Julie A Huber, Hilary G Morrison, Mitchell L Sogin and David Mark Welch
This study evaluates the accuracy and quality of Roche GS20 pyrosequencing. The researchers analyzed the V6 hypervariable region of ribosomal DNA from 43 bacterial clones to assess the error rate of the GS20 system. They found that the accuracy of unassembled sequences was 99.5%, and after removing low-quality reads, the accuracy improved to 99.75% or better. The study identified several factors that can be used to remove low-quality reads, including ambiguous base calls, homopolymer effects, and read length discrepancies. The error rate was calculated as 0.49% for the entire dataset, with insertions being the most common type of error (36% of errors), followed by deletions (27%), ambiguous bases (21%), and substitutions (16%). The study also found that reads with ambiguous base calls (Ns) contributed significantly to the error rate, and removing such reads reduced the error rate to 0.24%. The researchers concluded that using objective criteria to eliminate low-quality data can improve the accuracy of GS20 sequence reads in molecular ecological applications beyond that of traditional capillary sequencing methods. The study highlights the importance of identifying and removing low-quality reads to improve the accuracy of pyrosequencing data, particularly in studies that require high accuracy for detecting natural variation. The results suggest that the GS20 system has the potential to revolutionize high-throughput sequencing, but further improvements in the chemistry protocol and bioinformatics software are needed to reduce the error rate. The study also emphasizes the importance of using quality scores to assess the accuracy of sequencing data and the need for further research to improve the accuracy of pyrosequencing technology.This study evaluates the accuracy and quality of Roche GS20 pyrosequencing. The researchers analyzed the V6 hypervariable region of ribosomal DNA from 43 bacterial clones to assess the error rate of the GS20 system. They found that the accuracy of unassembled sequences was 99.5%, and after removing low-quality reads, the accuracy improved to 99.75% or better. The study identified several factors that can be used to remove low-quality reads, including ambiguous base calls, homopolymer effects, and read length discrepancies. The error rate was calculated as 0.49% for the entire dataset, with insertions being the most common type of error (36% of errors), followed by deletions (27%), ambiguous bases (21%), and substitutions (16%). The study also found that reads with ambiguous base calls (Ns) contributed significantly to the error rate, and removing such reads reduced the error rate to 0.24%. The researchers concluded that using objective criteria to eliminate low-quality data can improve the accuracy of GS20 sequence reads in molecular ecological applications beyond that of traditional capillary sequencing methods. The study highlights the importance of identifying and removing low-quality reads to improve the accuracy of pyrosequencing data, particularly in studies that require high accuracy for detecting natural variation. The results suggest that the GS20 system has the potential to revolutionize high-throughput sequencing, but further improvements in the chemistry protocol and bioinformatics software are needed to reduce the error rate. The study also emphasizes the importance of using quality scores to assess the accuracy of sequencing data and the need for further research to improve the accuracy of pyrosequencing technology.
Reach us at info@study.space