Vol. 89, pp. 10915–10919, November 1992 | STEVEN HENIKOFF AND JORJA G. HENIKOFF
The paper by Steven Henikoff and Jorja Henikoff introduces a new approach to deriving amino acid substitution matrices from aligned protein sequence blocks, rather than using the traditional Dayhoff model of evolutionary rates. This method involves analyzing over 2000 blocks of aligned sequence segments from more than 500 groups of related proteins. The authors derive frequency tables from these blocks, which are then used to calculate a logarithm of odds (LOD) matrix. This matrix is designed to better capture the relationships between amino acids in highly conserved regions of proteins, leading to improved performance in sequence alignments and homology searches.
The BLOSUM (Blocks Substitution Matrix) series, derived from this method, shows significant improvements over the Dayhoff PAM matrices in various tests, including multiple alignment accuracy, detection of ungapped alignments (BLAST), detection of gapped alignments (FASTA and Smith-Waterman), and determining the significance of an alignment (RDF2). The BLOSUM 62 matrix, in particular, outperforms the best PAM matrix (PAM 140) in 90 out of 504 tested groups.
The authors attribute the superior performance of BLOSUM matrices to their direct representation of relationships in highly conserved regions, which are more informative for distant evolutionary relationships compared to the extrapolated mutation rates used in the Dayhoff model. They also highlight the larger and more representative data set used in their approach, which includes a greater number of occurrences of specific amino acid pairs.
The study demonstrates the practical importance of these improved substitution matrices, particularly for weakly scoring alignments that are often missed or undervalued in traditional searches. The BLOSUM series is expected to remain stable due to its reliance on the identity and composition of groups in Prosite and the accuracy of the automated PROTOMAT system.The paper by Steven Henikoff and Jorja Henikoff introduces a new approach to deriving amino acid substitution matrices from aligned protein sequence blocks, rather than using the traditional Dayhoff model of evolutionary rates. This method involves analyzing over 2000 blocks of aligned sequence segments from more than 500 groups of related proteins. The authors derive frequency tables from these blocks, which are then used to calculate a logarithm of odds (LOD) matrix. This matrix is designed to better capture the relationships between amino acids in highly conserved regions of proteins, leading to improved performance in sequence alignments and homology searches.
The BLOSUM (Blocks Substitution Matrix) series, derived from this method, shows significant improvements over the Dayhoff PAM matrices in various tests, including multiple alignment accuracy, detection of ungapped alignments (BLAST), detection of gapped alignments (FASTA and Smith-Waterman), and determining the significance of an alignment (RDF2). The BLOSUM 62 matrix, in particular, outperforms the best PAM matrix (PAM 140) in 90 out of 504 tested groups.
The authors attribute the superior performance of BLOSUM matrices to their direct representation of relationships in highly conserved regions, which are more informative for distant evolutionary relationships compared to the extrapolated mutation rates used in the Dayhoff model. They also highlight the larger and more representative data set used in their approach, which includes a greater number of occurrences of specific amino acid pairs.
The study demonstrates the practical importance of these improved substitution matrices, particularly for weakly scoring alignments that are often missed or undervalued in traditional searches. The BLOSUM series is expected to remain stable due to its reliance on the identity and composition of groups in Prosite and the accuracy of the automated PROTOMAT system.