2022-03-10 | Richard Evans, Michael O'Neill, Alexander Pritzel, Natasha Antropova, Andrew Senior, Tim Green, Augustin Žídek, Russ Bates, Sam Blackwell, Jason Yim, Olaf Ronneberger, Sebastian Bodenstein, Michal Zielinski, Alex Bridgland, Anna Potapenko, Andrew Cowie, Kathryn Tunyasuvunakool, Rishub Jain, Ellen Clancy, Pushmeet Kohli, John Jumper, Demis Hassabis
This paper presents AlphaFold-Multimer, a model designed to predict the structures of multi-chain protein complexes. The model is specifically trained on known stoichiometry of multimeric inputs, significantly improving the accuracy of predicted multimeric interfaces compared to input-adapted single-chain AlphaFold while maintaining high intra-chain accuracy. On a benchmark dataset of 17 heterodimer proteins without templates, AlphaFold-Multimer achieves medium accuracy (DockQ ≥ 0.49) on 13 targets and high accuracy (DockQ ≥ 0.8) on 7 targets, outperforming the previous state-of-the-art system. For a large dataset of 4,446 recent protein complexes, AlphaFold-Multimer successfully predicts heteromeric interfaces in 70% of cases and produces high accuracy predictions in 26% of cases, an improvement of +27 and +14 percentage points over the flexible linker modification of AlphaFold, respectively. For homomeric interfaces, the success rate is 72% and the high accuracy rate is 36%, an improvement of +8 and +7 percentage points, respectively. The paper details the modifications made to the AlphaFold system to handle multiple chains during training and inference, including multi-chain permutation alignment, multiple sequence alignment construction, cross-chain genetics, and multi-chain cropping. The results demonstrate that AlphaFold-Multimer provides superior performance compared to existing approaches, including those based on using AlphaFold.This paper presents AlphaFold-Multimer, a model designed to predict the structures of multi-chain protein complexes. The model is specifically trained on known stoichiometry of multimeric inputs, significantly improving the accuracy of predicted multimeric interfaces compared to input-adapted single-chain AlphaFold while maintaining high intra-chain accuracy. On a benchmark dataset of 17 heterodimer proteins without templates, AlphaFold-Multimer achieves medium accuracy (DockQ ≥ 0.49) on 13 targets and high accuracy (DockQ ≥ 0.8) on 7 targets, outperforming the previous state-of-the-art system. For a large dataset of 4,446 recent protein complexes, AlphaFold-Multimer successfully predicts heteromeric interfaces in 70% of cases and produces high accuracy predictions in 26% of cases, an improvement of +27 and +14 percentage points over the flexible linker modification of AlphaFold, respectively. For homomeric interfaces, the success rate is 72% and the high accuracy rate is 36%, an improvement of +8 and +7 percentage points, respectively. The paper details the modifications made to the AlphaFold system to handle multiple chains during training and inference, including multi-chain permutation alignment, multiple sequence alignment construction, cross-chain genetics, and multi-chain cropping. The results demonstrate that AlphaFold-Multimer provides superior performance compared to existing approaches, including those based on using AlphaFold.