Sensitive protein alignments at tree-of-life scale using DIAMOND

Sensitive protein alignments at tree-of-life scale using DIAMOND

2021-04-07 | Benjamin Buchfink, Klaus Reuter, Hajk-Georg Drost
The article introduces an improved version of the protein alignment tool DIAMOND, which significantly enhances search performance and computational efficiency. This version of DIAMOND can perform tree-of-life scale protein alignments in hours, matching the sensitivity of the gold standard BLASTP while achieving an 80-360-fold speedup. The improvements include optimized algorithmic procedures, double indexing, and multiple spaced seeding, which enable efficient handling of large datasets. The tool is available as open-source software under the GPL3 license. The authors demonstrate the capabilities of the new version through benchmarking against BLASTP and MMSeqs2, showing that DIAMOND can achieve similar sensitivity levels with much faster computation times. They also showcase the scalability of DIAMOND on a supercomputer, performing a comprehensive alignment of all 281 million protein sequences from the NCBI nr database against the UniRef50 database in less than 18 hours. The article highlights the potential of DIAMOND for large-scale comparative genomics studies, such as tracing protein evolution and gene age inference.The article introduces an improved version of the protein alignment tool DIAMOND, which significantly enhances search performance and computational efficiency. This version of DIAMOND can perform tree-of-life scale protein alignments in hours, matching the sensitivity of the gold standard BLASTP while achieving an 80-360-fold speedup. The improvements include optimized algorithmic procedures, double indexing, and multiple spaced seeding, which enable efficient handling of large datasets. The tool is available as open-source software under the GPL3 license. The authors demonstrate the capabilities of the new version through benchmarking against BLASTP and MMSeqs2, showing that DIAMOND can achieve similar sensitivity levels with much faster computation times. They also showcase the scalability of DIAMOND on a supercomputer, performing a comprehensive alignment of all 281 million protein sequences from the NCBI nr database against the UniRef50 database in less than 18 hours. The article highlights the potential of DIAMOND for large-scale comparative genomics studies, such as tracing protein evolution and gene age inference.
Reach us at info@study.space