Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities

Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities

1998 | Brent Ewing and Phil Green
The paper discusses the development and implementation of error probability estimation in the base-calling program *phred* for automated sequencer traces. The authors aim to improve the accuracy and reliability of data processing in high-throughput sequencing by estimating the probability of error for each base-call. They describe a novel algorithm that does not rely on multivariate normality assumptions, uses parameters computed from windows of the trace, and focuses on optimizing discrimination ability in high-quality regions (error rates <0.01). The error probabilities are validated through extensive tests on different chemistries and electrophoretic conditions, showing high discrimination power and validity. The quality values assigned to base-calls are derived from these error probabilities, with higher quality values corresponding to lower error probabilities. The study also addresses potential improvements and limitations, including the impact of GC content and the need for separate training sets for different sequencing chemistries. The results demonstrate that the error probabilities can effectively distinguish between correct and incorrect base-calls, particularly in high-quality regions, and have significant implications for sequence assembly and finishing.The paper discusses the development and implementation of error probability estimation in the base-calling program *phred* for automated sequencer traces. The authors aim to improve the accuracy and reliability of data processing in high-throughput sequencing by estimating the probability of error for each base-call. They describe a novel algorithm that does not rely on multivariate normality assumptions, uses parameters computed from windows of the trace, and focuses on optimizing discrimination ability in high-quality regions (error rates <0.01). The error probabilities are validated through extensive tests on different chemistries and electrophoretic conditions, showing high discrimination power and validity. The quality values assigned to base-calls are derived from these error probabilities, with higher quality values corresponding to lower error probabilities. The study also addresses potential improvements and limitations, including the impact of GC content and the need for separate training sets for different sequencing chemistries. The results demonstrate that the error probabilities can effectively distinguish between correct and incorrect base-calls, particularly in high-quality regions, and have significant implications for sequence assembly and finishing.
Reach us at info@study.space