[slides] A review of deep learning methods for ligand based drug virtual screening

This review article provides a comprehensive overview of deep learning methods in ligand-based drug virtual screening, a computational approach to accelerate drug discovery by screening potential drug candidates from large databases. The article highlights the challenges and advancements in this field, emphasizing the importance of deep learning in capturing complex drug-target interactions and binding affinities. Key topics include: 1. **Introduction to Virtual Screening**: The article defines virtual screening and its two main categories—receptor-based and ligand-based. It explains the problem of predicting drug-target interactions (DTI) and drug-target affinities (DTA). 2. **Traditional Machine Learning Methods**: Various machine learning techniques, such as regression models and classification methods, are discussed, including their applications in virtual screening. The limitations of these methods, such as feature engineering and data annotation, are also addressed. 3. **Task Challenges**: The complexity of proteins, drugs, and their interactions poses significant challenges for deep learning-based virtual screening. Data complexity, data annotation, and feature representation are highlighted as key issues. 4. **Model Selection**: The article reviews various deep learning models used in virtual screening, including deep neural networks, graph-based models, and transformer models. It also discusses the use of pre-trained models and their potential in improving performance. 5. **Databases for Virtual Screening**: The article describes popular databases used in virtual screening, including drug-centered, protein-centered, and integrated databases. It provides details on benchmark datasets for DTI and DTA prediction, such as DUD-E, MUV, BindingDB, Davis, KIBA, and PDBbind. 6. **Data Representations**: Effective data representations for ligand drugs and target proteins are discussed. For ligands, string representations (e.g., SMILES), fingerprinting, and graph representations (2D and 3D) are explored. For proteins, string representations (e.g., one-hot encoding) and graph representations (2D and 3D) are considered. 7. **Pre-trained Model Embeddings**: The article reviews pre-trained models for drug and protein representations, such as Mol2vec, SMILES Transformer, ChemBERTa, and TrimNet, which have shown promising performance in downstream tasks. The review concludes by discussing the major issues and future directions in virtual screening, emphasizing the need for further research in data representation, model selection, and the integration of 3D protein structures.This review article provides a comprehensive overview of deep learning methods in ligand-based drug virtual screening, a computational approach to accelerate drug discovery by screening potential drug candidates from large databases. The article highlights the challenges and advancements in this field, emphasizing the importance of deep learning in capturing complex drug-target interactions and binding affinities. Key topics include: 1. **Introduction to Virtual Screening**: The article defines virtual screening and its two main categories—receptor-based and ligand-based. It explains the problem of predicting drug-target interactions (DTI) and drug-target affinities (DTA). 2. **Traditional Machine Learning Methods**: Various machine learning techniques, such as regression models and classification methods, are discussed, including their applications in virtual screening. The limitations of these methods, such as feature engineering and data annotation, are also addressed. 3. **Task Challenges**: The complexity of proteins, drugs, and their interactions poses significant challenges for deep learning-based virtual screening. Data complexity, data annotation, and feature representation are highlighted as key issues. 4. **Model Selection**: The article reviews various deep learning models used in virtual screening, including deep neural networks, graph-based models, and transformer models. It also discusses the use of pre-trained models and their potential in improving performance. 5. **Databases for Virtual Screening**: The article describes popular databases used in virtual screening, including drug-centered, protein-centered, and integrated databases. It provides details on benchmark datasets for DTI and DTA prediction, such as DUD-E, MUV, BindingDB, Davis, KIBA, and PDBbind. 6. **Data Representations**: Effective data representations for ligand drugs and target proteins are discussed. For ligands, string representations (e.g., SMILES), fingerprinting, and graph representations (2D and 3D) are explored. For proteins, string representations (e.g., one-hot encoding) and graph representations (2D and 3D) are considered. 7. **Pre-trained Model Embeddings**: The article reviews pre-trained models for drug and protein representations, such as Mol2vec, SMILES Transformer, ChemBERTa, and TrimNet, which have shown promising performance in downstream tasks. The review concludes by discussing the major issues and future directions in virtual screening, emphasizing the need for further research in data representation, model selection, and the integration of 3D protein structures.

A review of deep learning methods for ligand based drug virtual screening

2024 | Hongjie Wu, Junkai Liu, Runhua Zhang, Yaoyao Lu, Guozeng Cui, Zhiming Cui, Yijie Ding