June 21, 2008 | Aleksandr Morgulis, George Coulouris, Yan Raytselis, Thomas L. Madden, Richa Agarwala and Alejandro A. Schäffer
The paper presents a new version of the MegaBLAST module of BLAST that uses a database index to find initial seeds for matches. This approach, called 'indexed MegaBLAST', is faster than the 'non-indexed' version for most practical uses and outperforms miBLAST, another BLAST implementation with preprocessed databases, for most queries tested. The new version was integrated into NCBI's Web BLAST service, improving response times by dedicating machines to specific databases. The code for indexed MegaBLAST is part of the NCBI C++ toolkit, and the preprocessor program makembindex is also included. Indexed MegaBLAST has been used in production to search human and mouse genomes since October 2007. The paper describes the database index structure, seed search algorithm, and testing strategy. It shows that indexed MegaBLAST is faster than non-indexed MegaBLAST for most queries, especially for masked databases. The performance advantage is due to reduced seed searching time and efficient handling of masked sequences. The paper also discusses the comparison of indexed and non-indexed MegaBLAST in production environments, showing that indexed MegaBLAST is faster for shorter queries and non-indexed for longer ones. The results validate the performance improvements of database indexing over query indexing in BLAST. The paper concludes that indexed MegaBLAST is a viable alternative to traditional BLAST methods for sequence comparison.The paper presents a new version of the MegaBLAST module of BLAST that uses a database index to find initial seeds for matches. This approach, called 'indexed MegaBLAST', is faster than the 'non-indexed' version for most practical uses and outperforms miBLAST, another BLAST implementation with preprocessed databases, for most queries tested. The new version was integrated into NCBI's Web BLAST service, improving response times by dedicating machines to specific databases. The code for indexed MegaBLAST is part of the NCBI C++ toolkit, and the preprocessor program makembindex is also included. Indexed MegaBLAST has been used in production to search human and mouse genomes since October 2007. The paper describes the database index structure, seed search algorithm, and testing strategy. It shows that indexed MegaBLAST is faster than non-indexed MegaBLAST for most queries, especially for masked databases. The performance advantage is due to reduced seed searching time and efficient handling of masked sequences. The paper also discusses the comparison of indexed and non-indexed MegaBLAST in production environments, showing that indexed MegaBLAST is faster for shorter queries and non-indexed for longer ones. The results validate the performance improvements of database indexing over query indexing in BLAST. The paper concludes that indexed MegaBLAST is a viable alternative to traditional BLAST methods for sequence comparison.