10 May 2024 | Till Siebenmorgen, Filipe Menezes, Sabrina Benassou, Erinc Merdivan, Kieran Didi, André Santos Dias Mourão, Radostaw Kitel, Pietro Liò, Stefan Kesselheim, Marie Piraud, Fabian J. Theis, Michael Sattler, Grzegorz M. Popowicz
MISATO is a comprehensive dataset designed for structure-based drug discovery, combining quantum mechanical (QM) properties of small molecules and molecular dynamics (MD) simulations of approximately 20,000 experimental protein–ligand complexes. The dataset addresses the need for precise biomolecule–ligand interaction data, which is crucial for advancing structure-based drug discovery (SBDD). The QM calculations refine experimental structures, correcting errors and inconsistencies, while the MD simulations capture the dynamics and conformational flexibility of protein–ligand complexes over timescales ranging from nanoseconds to microseconds. The dataset includes extensive validation of experimental data, ensuring its reliability. AI baseline models trained on MISATO demonstrate improved accuracy in predicting quantum chemical properties, binding affinity, and protein flexibility. The dataset is publicly available and provides a user-friendly format for machine learning (ML) codes, facilitating the development of next-generation AI models for SBDD. The MISATO project aims to transform SBDD by providing a robust and comprehensive resource for researchers in chemistry, structural biology, biophysics, and bioinformatics.MISATO is a comprehensive dataset designed for structure-based drug discovery, combining quantum mechanical (QM) properties of small molecules and molecular dynamics (MD) simulations of approximately 20,000 experimental protein–ligand complexes. The dataset addresses the need for precise biomolecule–ligand interaction data, which is crucial for advancing structure-based drug discovery (SBDD). The QM calculations refine experimental structures, correcting errors and inconsistencies, while the MD simulations capture the dynamics and conformational flexibility of protein–ligand complexes over timescales ranging from nanoseconds to microseconds. The dataset includes extensive validation of experimental data, ensuring its reliability. AI baseline models trained on MISATO demonstrate improved accuracy in predicting quantum chemical properties, binding affinity, and protein flexibility. The dataset is publicly available and provides a user-friendly format for machine learning (ML) codes, facilitating the development of next-generation AI models for SBDD. The MISATO project aims to transform SBDD by providing a robust and comprehensive resource for researchers in chemistry, structural biology, biophysics, and bioinformatics.