[slides and audio] Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules

The paper presents a method for converting discrete representations of molecules into a continuous multidimensional representation, enabling efficient exploration and optimization of chemical compounds. A deep neural network is trained on existing chemical structures to create an encoder, decoder, and predictor. The encoder converts molecular representations into continuous vectors, the decoder converts these vectors back into molecular representations, and the predictor estimates chemical properties from the latent vectors. This continuous representation allows for the generation of novel chemical structures through simple operations in the latent space and enables gradient-based optimization to guide the search for optimized functional compounds. The method is demonstrated in drug-like molecules and molecules with fewer than nine heavy atoms, showing high fidelity in reconstruction, good predictive power, and the ability to perform model-based optimization in the latent space. The authors discuss potential improvements, such as using graph-based autoencoders and addressing stability and synthetic constraints.The paper presents a method for converting discrete representations of molecules into a continuous multidimensional representation, enabling efficient exploration and optimization of chemical compounds. A deep neural network is trained on existing chemical structures to create an encoder, decoder, and predictor. The encoder converts molecular representations into continuous vectors, the decoder converts these vectors back into molecular representations, and the predictor estimates chemical properties from the latent vectors. This continuous representation allows for the generation of novel chemical structures through simple operations in the latent space and enables gradient-based optimization to guide the search for optimized functional compounds. The method is demonstrated in drug-like molecules and molecules with fewer than nine heavy atoms, showing high fidelity in reconstruction, good predictive power, and the ability to perform model-based optimization in the latent space. The authors discuss potential improvements, such as using graph-based autoencoders and addressing stability and synthetic constraints.

Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules

5 Dec 2017 | Rafael Gómez-Bombarelli, Jennifer N. Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alán Aspuru-Guzik