Projecting Molecules into Synthesizable Chemical Spaces

Projecting Molecules into Synthesizable Chemical Spaces

7 Jun 2024 | Shitong Luo, Wenhao Gao, Zuofan Wu, Jian Peng, Connor W. Coley, Jianzhu Ma
This paper introduces a novel framework for projecting molecules into synthesizable chemical spaces, addressing the challenge of generating chemically valid and synthesizable molecules from generative models. The framework employs a postfix notation representation of synthetic pathways, enabling the model to generate molecules that are both structurally and functionally similar to the input, while ensuring synthetic accessibility. The model uses a transformer-based architecture, with an encoder for molecular graphs and a decoder for generating postfix notations of synthesis. The encoder converts molecular graphs into atom embeddings, while the decoder autoregressively generates postfix notations, allowing the model to synthesize molecules from purchasable building blocks and expert-defined reaction rules. The model is trained to predict the type of next token (building block, reaction, or end), and for building blocks, it retrieves the corresponding molecule using nearest-neighbor search. For reactions, it predicts the type of reaction based on the current context. The model is evaluated on various tasks, including bottom-up synthesis planning, structure-based drug design, goal-directed generation, and hit expansion. The results show that the model achieves higher success rates, reconstruction rates, and similarity scores compared to existing methods, demonstrating its ability to generate synthesizable analogs of unsynthesizable molecules. The model also shows improved performance in tasks such as projecting molecules generated by structure-based drug design models and goal-directed generative models, leading to better binding scores and structural similarity. The framework provides a promising approach for efficient exploration of chemical spaces while respecting synthetic feasibility, enhancing the potential of machine learning in molecular design.This paper introduces a novel framework for projecting molecules into synthesizable chemical spaces, addressing the challenge of generating chemically valid and synthesizable molecules from generative models. The framework employs a postfix notation representation of synthetic pathways, enabling the model to generate molecules that are both structurally and functionally similar to the input, while ensuring synthetic accessibility. The model uses a transformer-based architecture, with an encoder for molecular graphs and a decoder for generating postfix notations of synthesis. The encoder converts molecular graphs into atom embeddings, while the decoder autoregressively generates postfix notations, allowing the model to synthesize molecules from purchasable building blocks and expert-defined reaction rules. The model is trained to predict the type of next token (building block, reaction, or end), and for building blocks, it retrieves the corresponding molecule using nearest-neighbor search. For reactions, it predicts the type of reaction based on the current context. The model is evaluated on various tasks, including bottom-up synthesis planning, structure-based drug design, goal-directed generation, and hit expansion. The results show that the model achieves higher success rates, reconstruction rates, and similarity scores compared to existing methods, demonstrating its ability to generate synthesizable analogs of unsynthesizable molecules. The model also shows improved performance in tasks such as projecting molecules generated by structure-based drug design models and goal-directed generative models, leading to better binding scores and structural similarity. The framework provides a promising approach for efficient exploration of chemical spaces while respecting synthetic feasibility, enhancing the potential of machine learning in molecular design.
Reach us at info@study.space