2024 | Gabriele Corso*, Arthur Deng, Benjamin Fry, Nicholas Polizzi, Regina Barzilay, Tommi Jaakkola
This paper introduces DOCKGEN, a new benchmark for evaluating the generalization ability of molecular docking methods. The authors show that existing machine learning-based docking models have weak generalization abilities and propose CONFIDENCE BOOTSTRAPPING, a new training paradigm that significantly improves the ability of ML-based docking methods to dock to unseen protein classes. The authors analyze the scaling laws of ML-based docking and show that increasing data and model size can significantly improve generalization. They also propose DIFFDOCK-L, a new state-of-the-art docking method that improves performance on the DOCKGEN benchmark from 7.1% to 22.6%. The authors also propose CONFIDENCE BOOTSTRAPPING, a self-training method that uses the interaction between diffusion and confidence models to improve the performance of docking models. The method is tested on the DOCKGEN benchmark and shows significant improvements in performance, with the success rate increasing from 9.8% to 24.0%. The authors conclude that their methods significantly improve the generalization ability of docking models and bring us closer to a generalizable solution to the docking challenge.This paper introduces DOCKGEN, a new benchmark for evaluating the generalization ability of molecular docking methods. The authors show that existing machine learning-based docking models have weak generalization abilities and propose CONFIDENCE BOOTSTRAPPING, a new training paradigm that significantly improves the ability of ML-based docking methods to dock to unseen protein classes. The authors analyze the scaling laws of ML-based docking and show that increasing data and model size can significantly improve generalization. They also propose DIFFDOCK-L, a new state-of-the-art docking method that improves performance on the DOCKGEN benchmark from 7.1% to 22.6%. The authors also propose CONFIDENCE BOOTSTRAPPING, a self-training method that uses the interaction between diffusion and confidence models to improve the performance of docking models. The method is tested on the DOCKGEN benchmark and shows significant improvements in performance, with the success rate increasing from 9.8% to 24.0%. The authors conclude that their methods significantly improve the generalization ability of docking models and bring us closer to a generalizable solution to the docking challenge.