February 2024 | Michel van Kempen, Stephanie S. Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron L. M. Gilchrist, Johannes Söding & Martin Steinegger
Foldseek is a fast and accurate method for protein structure search, using a structural alphabet to represent tertiary interactions between residues. It significantly improves speed and sensitivity compared to existing tools like Dali, TM-align, and CE. Foldseek reduces computation times by four to five orders of magnitude, achieving 86%, 88%, and 133% of their sensitivities, respectively. It uses a 3Di alphabet to describe tertiary interactions, which allows for faster sequence-like alignment. Foldseek's prefilter and alignment methods enable efficient searching through large databases, with performance comparable to or better than existing tools. It is more than 4,000 times faster than TM-align and Dali, and over 21,000 times faster than CE. Foldseek also provides accurate E values and aligns structures with high sensitivity and precision. It is effective in detecting homologous structures, even when they are not globally superposable. Foldseek is available as a webserver for multi-database searches and has been benchmarked against other tools on the SCOPe dataset, showing high sensitivity and speed. The method is expected to transform structural biology and bioinformatics by enabling efficient analysis of large protein structure databases.Foldseek is a fast and accurate method for protein structure search, using a structural alphabet to represent tertiary interactions between residues. It significantly improves speed and sensitivity compared to existing tools like Dali, TM-align, and CE. Foldseek reduces computation times by four to five orders of magnitude, achieving 86%, 88%, and 133% of their sensitivities, respectively. It uses a 3Di alphabet to describe tertiary interactions, which allows for faster sequence-like alignment. Foldseek's prefilter and alignment methods enable efficient searching through large databases, with performance comparable to or better than existing tools. It is more than 4,000 times faster than TM-align and Dali, and over 21,000 times faster than CE. Foldseek also provides accurate E values and aligns structures with high sensitivity and precision. It is effective in detecting homologous structures, even when they are not globally superposable. Foldseek is available as a webserver for multi-database searches and has been benchmarked against other tools on the SCOPe dataset, showing high sensitivity and speed. The method is expected to transform structural biology and bioinformatics by enabling efficient analysis of large protein structure databases.