PLINDER: The protein-ligand interactions dataset and evaluation resource

PLINDER: The protein-ligand interactions dataset and evaluation resource

July 19, 2024 | Janani Durairaj, Yusuf Adeshina, Zhonglin Cao, Xuejin Zhang, Vladas Oleinikovas, Thomas Duignan, Zachary McClure, Xavier Robin, Gabriel Studer, Daniel Kovtun, Emanuele Rossi, Guoqing Zhou, Srimukh Veccham, Clemens Iser, Yuxing Peng, Prabindh Sundareson, Mehmet Akdel, Gabriele Corso, Hannes Stärk, Gerardo Tauriello, Zachary Carpenter, Michael Bronstein, Emine Kucukbenli, Torsten Schwede, Luca Naef
PLINDER is a comprehensive and largest annotated dataset for protein-ligand interactions (PLI), comprising 449,383 PLI systems, each with over 500 annotations. The dataset includes various types of PLI systems, such as multi-ligand systems, oligonucleotides, peptides, and saccharides. PLINDER calculates similarity metrics at the protein, pocket, PLI, and ligand levels, enabling the measurement of diversity and detection of information leakage. The dataset also provides quality and domain information for complexes and links *holo* complexes to relevant *apo* and predicted structures. The splitting algorithm ensures diverse train and high-quality test sets, minimizing task-specific leakage and maximizing test set quality. The performance of DiffDock, a deep learning-based method, is evaluated on different splits of PLINDER, demonstrating the importance of training set size and diversity in model accuracy. The dataset and associated code are available for public use, aiming to advance the field of protein-ligand interaction prediction.PLINDER is a comprehensive and largest annotated dataset for protein-ligand interactions (PLI), comprising 449,383 PLI systems, each with over 500 annotations. The dataset includes various types of PLI systems, such as multi-ligand systems, oligonucleotides, peptides, and saccharides. PLINDER calculates similarity metrics at the protein, pocket, PLI, and ligand levels, enabling the measurement of diversity and detection of information leakage. The dataset also provides quality and domain information for complexes and links *holo* complexes to relevant *apo* and predicted structures. The splitting algorithm ensures diverse train and high-quality test sets, minimizing task-specific leakage and maximizing test set quality. The performance of DiffDock, a deep learning-based method, is evaluated on different splits of PLINDER, demonstrating the importance of training set size and diversity in model accuracy. The dataset and associated code are available for public use, aiming to advance the field of protein-ligand interaction prediction.
Reach us at info@study.space
[slides] PLINDER%3A The protein-ligand interactions dataset and evaluation resource | StudySpace