Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic Trees

Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic Trees

1987 | Da-Fei Feng and Russell F. Doolittle
A progressive sequence alignment method is described that uses the Needleman and Wunsch pairwise alignment algorithm iteratively to align multiple protein sequences and construct evolutionary trees. The method assumes that sequences share a common ancestor and builds trees from difference matrices derived from the multiple alignment. It prioritizes comparisons of recently diverged sequences over those from the distant past, following the rule "once a gap, always a gap." This method was applied to three sets of protein sequences: 7 superoxide dismutases, 11 globins, and 9 tyrosine kinase-like sequences. The resulting trees were compared with conventional pairwise methods, and in several cases, the progressive method produced trees more consistent with biological expectations. The construction of evolutionary trees from sequence data involves clustering sequences based on similarity. However, uncertainties in topology and branch lengths are common, requiring significant effort to find the "best tree." Proper alignment of sequences is crucial, as it affects the accuracy of the tree. Alignments can be obtained by maximizing similarity or minimizing differences. Pairwise alignments are typically used, but they may not be consistent when grouped. To address this, multiple alignments are often created by shifting sequences to minimize differences. The proposed method uses progressive alignment, starting with the most similar sequences and adding the next most similar sequence. It prioritizes recent events over distant ones, using the rule "once a gap, always a gap." Neutral elements are inserted into sequences when gaps occur, making them invisible to the scoring system. The method provides multiple sequence alignments quickly and simply by objective criteria. It has been applied to several groups of protein sequences, resulting in trees that often align with expected phylogenetic relationships. The method is described in detail, with applications to specific protein sequences, and is shown to produce trees that differ significantly from traditional schemes but are often more biologically plausible.A progressive sequence alignment method is described that uses the Needleman and Wunsch pairwise alignment algorithm iteratively to align multiple protein sequences and construct evolutionary trees. The method assumes that sequences share a common ancestor and builds trees from difference matrices derived from the multiple alignment. It prioritizes comparisons of recently diverged sequences over those from the distant past, following the rule "once a gap, always a gap." This method was applied to three sets of protein sequences: 7 superoxide dismutases, 11 globins, and 9 tyrosine kinase-like sequences. The resulting trees were compared with conventional pairwise methods, and in several cases, the progressive method produced trees more consistent with biological expectations. The construction of evolutionary trees from sequence data involves clustering sequences based on similarity. However, uncertainties in topology and branch lengths are common, requiring significant effort to find the "best tree." Proper alignment of sequences is crucial, as it affects the accuracy of the tree. Alignments can be obtained by maximizing similarity or minimizing differences. Pairwise alignments are typically used, but they may not be consistent when grouped. To address this, multiple alignments are often created by shifting sequences to minimize differences. The proposed method uses progressive alignment, starting with the most similar sequences and adding the next most similar sequence. It prioritizes recent events over distant ones, using the rule "once a gap, always a gap." Neutral elements are inserted into sequences when gaps occur, making them invisible to the scoring system. The method provides multiple sequence alignments quickly and simply by objective criteria. It has been applied to several groups of protein sequences, resulting in trees that often align with expected phylogenetic relationships. The method is described in detail, with applications to specific protein sequences, and is shown to produce trees that differ significantly from traditional schemes but are often more biologically plausible.
Reach us at info@study.space