Understanding Recognition of protein coding regions in DNA sequences.

The paper presents a method for distinguishing protein-coding regions (PCS) from non-coding regions in DNA sequences. The method, called TESTCODE, is based on statistical properties of the base sequence and does not rely on specific initiation signals. The authors define eight numerical parameters to measure the asymmetry in base distribution among codon positions and the overall base content. These parameters are then weighted to create a single indicator, TESTCODE, which predicts whether a sequence is coding or non-coding. TESTCODE was tested on 400,000 bases of sequence data from the Los Alamos Sequence Library and showed a 5% misclassification rate and a "No Opinion" rate of 18%. The method was also used to predict new coding and non-coding regions in published sequences, highlighting its potential for discovering new proteins and improving sequence analysis. The authors discuss the limitations and applications of TESTCODE, emphasizing its utility in both experimental and theoretical contexts.The paper presents a method for distinguishing protein-coding regions (PCS) from non-coding regions in DNA sequences. The method, called TESTCODE, is based on statistical properties of the base sequence and does not rely on specific initiation signals. The authors define eight numerical parameters to measure the asymmetry in base distribution among codon positions and the overall base content. These parameters are then weighted to create a single indicator, TESTCODE, which predicts whether a sequence is coding or non-coding. TESTCODE was tested on 400,000 bases of sequence data from the Los Alamos Sequence Library and showed a 5% misclassification rate and a "No Opinion" rate of 18%. The method was also used to predict new coding and non-coding regions in published sequences, highlighting its potential for discovering new proteins and improving sequence analysis. The authors discuss the limitations and applications of TESTCODE, emphasizing its utility in both experimental and theoretical contexts.

Recognition of protein coding regions in DNA sequences

1982 | James W. Fickett