Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

December 28, 2017 | Marwin H. S. Segler, Thierry Kogej, Christian Tyrchan, and Mark P. Waller
This research article presents a method for generating focused molecule libraries for drug discovery using recurrent neural networks (RNNs). The study demonstrates that RNNs can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. The model is trained on large sets of molecules and can generate novel molecules with properties similar to the training data. The model is further fine-tuned with small sets of molecules known to be active against specific biological targets, enabling the generation of molecules with desired activity. The model was tested against two targets: Staphylococcus aureus and Plasmodium falciparum. Against Staphylococcus aureus, the model reproduced 14% of 6051 test molecules, while against Plasmodium falciparum, it reproduced 28% of 1240 test molecules. When combined with a scoring function, the model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery. The study also shows that the model can generate diverse and focused molecule libraries. The model was trained on a large set of molecules from the ChEMBL database and then fine-tuned on smaller sets of molecules active toward specific targets. The model was evaluated using target prediction models, which classify molecules as active or inactive against a target. The results indicate that the model can generate molecules with properties similar to the training data and can be used to create focused libraries enriched with potentially active molecules. The study highlights the potential of RNNs in drug discovery, demonstrating that they can be used to generate novel molecules with desired properties. The model is trained on a large set of molecules and can be fine-tuned to generate molecules active toward specific targets. The results show that the model can generate molecules with high similarity to the training data and can be used to create focused libraries for drug discovery. The study also shows that the model can be used in a cycle of design-synthesis-test, where molecules are generated, scored, and retrained to improve the design process. The model is shown to be effective in generating molecules with desired properties and can be used to create focused libraries for drug discovery.This research article presents a method for generating focused molecule libraries for drug discovery using recurrent neural networks (RNNs). The study demonstrates that RNNs can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. The model is trained on large sets of molecules and can generate novel molecules with properties similar to the training data. The model is further fine-tuned with small sets of molecules known to be active against specific biological targets, enabling the generation of molecules with desired activity. The model was tested against two targets: Staphylococcus aureus and Plasmodium falciparum. Against Staphylococcus aureus, the model reproduced 14% of 6051 test molecules, while against Plasmodium falciparum, it reproduced 28% of 1240 test molecules. When combined with a scoring function, the model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery. The study also shows that the model can generate diverse and focused molecule libraries. The model was trained on a large set of molecules from the ChEMBL database and then fine-tuned on smaller sets of molecules active toward specific targets. The model was evaluated using target prediction models, which classify molecules as active or inactive against a target. The results indicate that the model can generate molecules with properties similar to the training data and can be used to create focused libraries enriched with potentially active molecules. The study highlights the potential of RNNs in drug discovery, demonstrating that they can be used to generate novel molecules with desired properties. The model is trained on a large set of molecules and can be fine-tuned to generate molecules active toward specific targets. The results show that the model can generate molecules with high similarity to the training data and can be used to create focused libraries for drug discovery. The study also shows that the model can be used in a cycle of design-synthesis-test, where molecules are generated, scored, and retrained to improve the design process. The model is shown to be effective in generating molecules with desired properties and can be used to create focused libraries for drug discovery.
Reach us at info@study.space