[slides and audio] Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

This paper presents a method for generating focused molecule libraries for drug discovery using recurrent neural networks (RNNs). The authors train an RNN as a generative model for molecular structures, similar to statistical language models in natural language processing. The trained model can generate new molecules with properties that correlate well with the training data. To enrich libraries with molecules active against a specific biological target, the model is fine-tuned using small sets of known actives. The model's performance is evaluated by reproducing hold-out test sets of known biologically active molecules. The results show that the model can generate diverse and valid molecules, and when fine-tuned, it can produce novel molecules with the desired activity. The authors also demonstrate that the model can simulate the complete de novo drug design cycle, including structure generation, scoring, and retraining, without needing a set of known actives to start. The method is conceptually orthogonal to established approaches, simple to set up and use, and does not rely on hand-encoded expert knowledge. However, interpretability is a weakness. The authors suggest that deep neural networks can be complementary to established approaches in drug discovery, but caution that generating almost correct molecules is not enough for drug discovery, which is a "needle in a haystack" problem.This paper presents a method for generating focused molecule libraries for drug discovery using recurrent neural networks (RNNs). The authors train an RNN as a generative model for molecular structures, similar to statistical language models in natural language processing. The trained model can generate new molecules with properties that correlate well with the training data. To enrich libraries with molecules active against a specific biological target, the model is fine-tuned using small sets of known actives. The model's performance is evaluated by reproducing hold-out test sets of known biologically active molecules. The results show that the model can generate diverse and valid molecules, and when fine-tuned, it can produce novel molecules with the desired activity. The authors also demonstrate that the model can simulate the complete de novo drug design cycle, including structure generation, scoring, and retraining, without needing a set of known actives to start. The method is conceptually orthogonal to established approaches, simple to set up and use, and does not rely on hand-encoded expert knowledge. However, interpretability is a weakness. The authors suggest that deep neural networks can be complementary to established approaches in drug discovery, but caution that generating almost correct molecules is not enough for drug discovery, which is a "needle in a haystack" problem.

Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

December 28, 2017 | Marwin H. S. Segler, Thierry Kogej, Christian Tyrchan, and Mark P. Waller