Improved tools for biological sequence comparison

Improved tools for biological sequence comparison

April 1988 | WILLIAM R. PEARSON* AND DAVID J. LIPMAN†
The paper introduces three computer programs—FASTA, RDF2, and LFASTA—for comparing protein and DNA sequences. FASTA is a more sensitive version of FASTP, which can search protein or DNA databases and compare protein sequences to DNA databases by translating DNA. FASTA improves sensitivity by allowing multiple similarity regions to be joined. RDF2 evaluates the significance of similarity scores using a shuffling method that preserves local composition. LFASTA identifies local similarities between sequences and displays them as a graphic matrix or alignments. The programs balance sensitivity and selectivity with speed and memory. FASTP and FASTA use a lookup table to find sequence identities, with the ktup parameter determining the number of consecutive identities. FASTA optimizes initial regions and joins them for higher scores. LFASTA considers all initial regions and computes local alignments for each. The programs use various scoring matrices and can be applied to any alphabet with arbitrary scoring values. The methods involve four steps: identifying identity regions, rescoring with a matrix, optimizing initial regions, and aligning sequences. FASTA and LFASTA use optimized alignment methods to improve scores. Statistical significance is evaluated using shuffling, with RDF2 providing more accurate results. The programs are implemented in C and run on various systems. Examples show FASTA's improved sensitivity compared to FASTP. FASTA can search DNA databases by translating sequences. LFASTA detects local similarities and displays them as alignments or graphic matrices. The programs are useful for identifying biologically significant similarities and evaluating their statistical significance. The methods are efficient and can handle large datasets, making them valuable tools for sequence analysis.The paper introduces three computer programs—FASTA, RDF2, and LFASTA—for comparing protein and DNA sequences. FASTA is a more sensitive version of FASTP, which can search protein or DNA databases and compare protein sequences to DNA databases by translating DNA. FASTA improves sensitivity by allowing multiple similarity regions to be joined. RDF2 evaluates the significance of similarity scores using a shuffling method that preserves local composition. LFASTA identifies local similarities between sequences and displays them as a graphic matrix or alignments. The programs balance sensitivity and selectivity with speed and memory. FASTP and FASTA use a lookup table to find sequence identities, with the ktup parameter determining the number of consecutive identities. FASTA optimizes initial regions and joins them for higher scores. LFASTA considers all initial regions and computes local alignments for each. The programs use various scoring matrices and can be applied to any alphabet with arbitrary scoring values. The methods involve four steps: identifying identity regions, rescoring with a matrix, optimizing initial regions, and aligning sequences. FASTA and LFASTA use optimized alignment methods to improve scores. Statistical significance is evaluated using shuffling, with RDF2 providing more accurate results. The programs are implemented in C and run on various systems. Examples show FASTA's improved sensitivity compared to FASTP. FASTA can search DNA databases by translating sequences. LFASTA detects local similarities and displays them as alignments or graphic matrices. The programs are useful for identifying biologically significant similarities and evaluating their statistical significance. The methods are efficient and can handle large datasets, making them valuable tools for sequence analysis.
Reach us at info@study.space
[slides and audio] Improved tools for biological sequence comparison.