Vol. 15 nos 7/8 1999 Pages 563-577 | Gerald Z. Hertz and Gary D. Stormo
The paper by Gerald Z. Hertz and Gary D. Stormo presents a method for identifying functional relationships among DNA, RNA, or protein sequences by aligning multiple sequences. The authors describe four key components of their approach: a log-likelihood scoring scheme called information content, methods for estimating the P-value of an information content score, a method for counting the number of possible alignments given the sequence data, and a greedy algorithm for determining optimal alignments. They also test the accuracy of their P-value calculations and provide an example of using their algorithm to identify binding sites for the Escherichia coli CRP protein. The paper discusses the distinction between alignment models and alignment algorithms, emphasizing the importance of aligning functionally related sequences without insertions or deletions. The authors detail the calculation of the information content of an alignment matrix and the estimation of the P-value using a large-deviation technique. They also describe a numerical method for approximating the P-value and provide algorithms for calculating the moment-generating function and its derivatives. The paper includes a discussion on counting the number of possible alignments and comparing alignments with different widths and numbers of sequences. Finally, they present two alignment algorithms: one where the user specifies the alignment width and another where the width is determined by adjusting a bias term.The paper by Gerald Z. Hertz and Gary D. Stormo presents a method for identifying functional relationships among DNA, RNA, or protein sequences by aligning multiple sequences. The authors describe four key components of their approach: a log-likelihood scoring scheme called information content, methods for estimating the P-value of an information content score, a method for counting the number of possible alignments given the sequence data, and a greedy algorithm for determining optimal alignments. They also test the accuracy of their P-value calculations and provide an example of using their algorithm to identify binding sites for the Escherichia coli CRP protein. The paper discusses the distinction between alignment models and alignment algorithms, emphasizing the importance of aligning functionally related sequences without insertions or deletions. The authors detail the calculation of the information content of an alignment matrix and the estimation of the P-value using a large-deviation technique. They also describe a numerical method for approximating the P-value and provide algorithms for calculating the moment-generating function and its derivatives. The paper includes a discussion on counting the number of possible alignments and comparing alignments with different widths and numbers of sequences. Finally, they present two alignment algorithms: one where the user specifies the alignment width and another where the width is determined by adjusting a bias term.