[slides and audio] Base-calling of automated sequencer traces using phred. II. Error probabilities.

This paper presents a method for estimating error probabilities for base-calls in automated sequencing traces using Phred. The method estimates the probability of error for each base-call based on parameters derived from the trace data. These error probabilities are shown to be valid and effective at distinguishing correct from incorrect base-calls under various sequencing conditions. They are crucial for the assembly and finishing programs phrap and consed. The quality of read data from automated sequencers varies, and reliable measures of this quality are essential for effective data use. Position-specific error probabilities are particularly useful for this purpose. They improve assembly accuracy and completeness, allow more accurate consensus sequences, provide an objective criterion for finishing, and serve as a measure for data quality. Several base-calling algorithms have developed confidence measures, but few have validated their accuracy or discrimination power. Lawrence and Solovyev (1994) conducted a thorough study defining trace parameters and performing discriminant analysis. This paper describes a different approach for estimating error probabilities and investigates its properties. The main features of this work include a novel algorithm that does not assume multivariate normality, the use of parameters from trace windows that are more effective at discrimination, and an emphasis on optimizing discrimination in high-quality parts of the trace. The paper describes the use of log-transformed error probabilities, which facilitates working with error rates near zero. The quality value q is defined as q = -10 × log₁₀(p), where p is the estimated error probability. A base-call with a 1/1000 probability of being incorrect is assigned a quality value of 30. The paper discusses the requirements for error probabilities, which must be predictive and valid. It describes a method for assigning error probabilities that maximizes discrimination power for small error rates. The method uses parameters derived from the trace data, focusing on those that reflect data quality. The algorithm used to calibrate error probabilities is described, which finds optimal thresholds for error rates. The paper also discusses trace parameters that are effective at detecting errors, such as peak spacing, uncalled/called ratio, and peak resolution. These parameters are used to compute error probabilities. The paper describes the calibration of error probabilities using a greedy algorithm that finds optimal thresholds for error rates. The paper presents results from studies using ABI-processed trace data from four sets of cosmids. The results show that the error probabilities are valid and effective at distinguishing correct from incorrect base-calls. The quality values assigned to base-calls are shown to be accurate and useful for data quality monitoring. The paper discusses the validity of error probabilities in different sequencing conditions and the potential for improvements in error probability calibration. It also discusses the importance of high-quality data in sequencing and the implications for finishing and assembly. The paper concludes that the error probability calibration method is effective and has the potential for further improvements.This paper presents a method for estimating error probabilities for base-calls in automated sequencing traces using Phred. The method estimates the probability of error for each base-call based on parameters derived from the trace data. These error probabilities are shown to be valid and effective at distinguishing correct from incorrect base-calls under various sequencing conditions. They are crucial for the assembly and finishing programs phrap and consed. The quality of read data from automated sequencers varies, and reliable measures of this quality are essential for effective data use. Position-specific error probabilities are particularly useful for this purpose. They improve assembly accuracy and completeness, allow more accurate consensus sequences, provide an objective criterion for finishing, and serve as a measure for data quality. Several base-calling algorithms have developed confidence measures, but few have validated their accuracy or discrimination power. Lawrence and Solovyev (1994) conducted a thorough study defining trace parameters and performing discriminant analysis. This paper describes a different approach for estimating error probabilities and investigates its properties. The main features of this work include a novel algorithm that does not assume multivariate normality, the use of parameters from trace windows that are more effective at discrimination, and an emphasis on optimizing discrimination in high-quality parts of the trace. The paper describes the use of log-transformed error probabilities, which facilitates working with error rates near zero. The quality value q is defined as q = -10 × log₁₀(p), where p is the estimated error probability. A base-call with a 1/1000 probability of being incorrect is assigned a quality value of 30. The paper discusses the requirements for error probabilities, which must be predictive and valid. It describes a method for assigning error probabilities that maximizes discrimination power for small error rates. The method uses parameters derived from the trace data, focusing on those that reflect data quality. The algorithm used to calibrate error probabilities is described, which finds optimal thresholds for error rates. The paper also discusses trace parameters that are effective at detecting errors, such as peak spacing, uncalled/called ratio, and peak resolution. These parameters are used to compute error probabilities. The paper describes the calibration of error probabilities using a greedy algorithm that finds optimal thresholds for error rates. The paper presents results from studies using ABI-processed trace data from four sets of cosmids. The results show that the error probabilities are valid and effective at distinguishing correct from incorrect base-calls. The quality values assigned to base-calls are shown to be accurate and useful for data quality monitoring. The paper discusses the validity of error probabilities in different sequencing conditions and the potential for improvements in error probability calibration. It also discusses the importance of high-quality data in sequencing and the implications for finishing and assembly. The paper concludes that the error probability calibration method is effective and has the potential for further improvements.

Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities

1998 | Brent Ewing and Phil Green