2001 | Zemin Ning, Anthony J. Cox, and James C. Mullikin
The paper introduces SSAHA (Sequence Search and Alignment by Hashing Algorithm), an efficient method for searching large DNA databases. The algorithm pre-processes sequences by breaking them into k-tuples and storing their positions in a hash table. This allows for fast searching of query sequences by retrieving and sorting hits from the hash table. The authors discuss the impact of k-tuple length on search speed, memory usage, and sensitivity. Computational experiments show that SSAHA can be three to four orders of magnitude faster than BLAST or FASTA while requiring less memory than suffix tree methods. SSAHA is used for high-throughput SNP detection and large-scale sequence assembly, and it provides web-based sequence search facilities for Ensembl projects. The paper also details the construction of the hash table, sequence search process, and results of performance tests, demonstrating that SSAHA is particularly effective for databases of human genome size or larger.The paper introduces SSAHA (Sequence Search and Alignment by Hashing Algorithm), an efficient method for searching large DNA databases. The algorithm pre-processes sequences by breaking them into k-tuples and storing their positions in a hash table. This allows for fast searching of query sequences by retrieving and sorting hits from the hash table. The authors discuss the impact of k-tuple length on search speed, memory usage, and sensitivity. Computational experiments show that SSAHA can be three to four orders of magnitude faster than BLAST or FASTA while requiring less memory than suffix tree methods. SSAHA is used for high-throughput SNP detection and large-scale sequence assembly, and it provides web-based sequence search facilities for Ensembl projects. The paper also details the construction of the hash table, sequence search process, and results of performance tests, demonstrating that SSAHA is particularly effective for databases of human genome size or larger.