This review summarizes the latest advances in deep learning methods for ligand-based drug virtual screening. It first introduces the basic concepts of virtual screening, common datasets, and data representation methods. Then, it compares and analyzes numerous deep learning methods for drug virtual screening. Additionally, a dataset of different sizes is constructed to evaluate the performance of each deep learning model for the difficult problem of large-scale ligand virtual screening. Finally, the existing challenges and future directions in the field of virtual screening are presented.
Drug discovery is costly and time-consuming, and modern drug discovery efforts are increasingly relying on computational methods to reduce time and financial costs. Deep learning has shown excellent performance in areas such as computer vision and natural language processing and has been applied to computer-aided drug design. Deep learning models can process large-scale complex biochemical data and learn and exploit implicit patterns. Deep neural networks can process the properties of atoms and bonds of compounds while retaining structural and topology information to capture their implied features. Compared with traditional machine learning algorithms, deep learning-based virtual screening methods can capture more complex drug and target knowledge representations while discovering hidden association information. However, different deep learning models have shown different performance in virtual screening. In addition, deep learning methods have encountered bottlenecks when facing the problem of ligand screening for large proteins, and how to effectively solve this problem has become a hot research topic in virtual screening.
This review provides a comprehensive analysis and description of the latest technology and research progress in this field. It first introduces the basic concepts of virtual screening, commonly-used databases, and data representation methods. Then, various deep learning-based computational methods for state-of-the-art drug-target interaction and binding affinity prediction are analyzed and compared, including the categorization of different methods and discussion of the results. To further address the difficulty of virtual screening of large proteins, a quantitative comparison of several representative methods is extensively constructed with datasets of different sizes. In closing, the major issues and future directions are discussed, including novel computational strategies and ideas for employing our knowledge on virtual screening.
The review discusses the problem definition, traditional machine learning methods, task challenges, databases for virtual screening, and data representations. It highlights the challenges in data complexity, data annotation, feature representation, and model selection in deep learning-based virtual screening. The review also presents various databases used in virtual screening, including drug-centered, protein-centered, and integrated databases. It discusses the DTI and DTA task datasets, including DUD-E, MUV, BindingDB, DrugBank, Davis, KIBA, and PDBBind. The review also discusses the data representations for drug and protein data, including strings, fingerprints, 2D graphs, 3D graphs, and pre-trained model embeddings. The review concludes that deep learning-based virtual screening has great potential and requires further research and improvement.This review summarizes the latest advances in deep learning methods for ligand-based drug virtual screening. It first introduces the basic concepts of virtual screening, common datasets, and data representation methods. Then, it compares and analyzes numerous deep learning methods for drug virtual screening. Additionally, a dataset of different sizes is constructed to evaluate the performance of each deep learning model for the difficult problem of large-scale ligand virtual screening. Finally, the existing challenges and future directions in the field of virtual screening are presented.
Drug discovery is costly and time-consuming, and modern drug discovery efforts are increasingly relying on computational methods to reduce time and financial costs. Deep learning has shown excellent performance in areas such as computer vision and natural language processing and has been applied to computer-aided drug design. Deep learning models can process large-scale complex biochemical data and learn and exploit implicit patterns. Deep neural networks can process the properties of atoms and bonds of compounds while retaining structural and topology information to capture their implied features. Compared with traditional machine learning algorithms, deep learning-based virtual screening methods can capture more complex drug and target knowledge representations while discovering hidden association information. However, different deep learning models have shown different performance in virtual screening. In addition, deep learning methods have encountered bottlenecks when facing the problem of ligand screening for large proteins, and how to effectively solve this problem has become a hot research topic in virtual screening.
This review provides a comprehensive analysis and description of the latest technology and research progress in this field. It first introduces the basic concepts of virtual screening, commonly-used databases, and data representation methods. Then, various deep learning-based computational methods for state-of-the-art drug-target interaction and binding affinity prediction are analyzed and compared, including the categorization of different methods and discussion of the results. To further address the difficulty of virtual screening of large proteins, a quantitative comparison of several representative methods is extensively constructed with datasets of different sizes. In closing, the major issues and future directions are discussed, including novel computational strategies and ideas for employing our knowledge on virtual screening.
The review discusses the problem definition, traditional machine learning methods, task challenges, databases for virtual screening, and data representations. It highlights the challenges in data complexity, data annotation, feature representation, and model selection in deep learning-based virtual screening. The review also presents various databases used in virtual screening, including drug-centered, protein-centered, and integrated databases. It discusses the DTI and DTA task datasets, including DUD-E, MUV, BindingDB, DrugBank, Davis, KIBA, and PDBBind. The review also discusses the data representations for drug and protein data, including strings, fingerprints, 2D graphs, 3D graphs, and pre-trained model embeddings. The review concludes that deep learning-based virtual screening has great potential and requires further research and improvement.