29 Mar 2019 | Wengong Jin, Regina Barzilay, Tommi Jaakkola
This paper introduces a junction tree variational autoencoder (JT-VAE) for generating molecular graphs. The main contribution is a new generative model that directly constructs molecular graphs by first generating a tree-structured scaffold of chemical substructures and then combining them into a molecule using a graph message passing network. This approach allows for incremental molecule expansion while maintaining chemical validity at each step. The model outperforms previous state-of-the-art baselines in multiple tasks, including molecular generation and optimization.
The JT-VAE extends the variational autoencoder to molecular graphs by introducing a suitable encoder and decoder. It represents molecules as a combination of a tree structure (junction tree) and a graph structure. The tree structure encodes the scaffold of subgraph components, while the graph structure captures fine-grained connectivity. The model generates molecular graphs in two phases: first, a tree-structured scaffold is generated, and then the subgraphs are assembled into a complete molecular graph.
The tree decomposition algorithm identifies valid chemical substructures (rings, bonds, and atoms) and forms a junction tree to represent the molecule. The graph encoder and tree encoder generate latent representations of the molecular graph and junction tree, respectively. The tree decoder reconstructs the junction tree from the latent representation, and the graph decoder assembles the subgraphs into a complete molecular graph.
The model is evaluated on multiple tasks, including molecule reconstruction, Bayesian optimization, and constrained molecule optimization. It produces 100% valid molecules when sampled from a prior distribution and outperforms baselines in generating molecules with desired properties. The model also excels in constrained optimization, where it finds molecules with high property values while maintaining similarity to the original molecule.
The JT-VAE approach provides a more efficient and chemically valid method for generating molecular graphs compared to previous approaches that rely on SMILES strings. It enables the automated design of molecules based on specific chemical properties, which is crucial for drug discovery. The model's ability to generate valid molecules while maintaining chemical validity makes it a promising tool for molecular graph generation.This paper introduces a junction tree variational autoencoder (JT-VAE) for generating molecular graphs. The main contribution is a new generative model that directly constructs molecular graphs by first generating a tree-structured scaffold of chemical substructures and then combining them into a molecule using a graph message passing network. This approach allows for incremental molecule expansion while maintaining chemical validity at each step. The model outperforms previous state-of-the-art baselines in multiple tasks, including molecular generation and optimization.
The JT-VAE extends the variational autoencoder to molecular graphs by introducing a suitable encoder and decoder. It represents molecules as a combination of a tree structure (junction tree) and a graph structure. The tree structure encodes the scaffold of subgraph components, while the graph structure captures fine-grained connectivity. The model generates molecular graphs in two phases: first, a tree-structured scaffold is generated, and then the subgraphs are assembled into a complete molecular graph.
The tree decomposition algorithm identifies valid chemical substructures (rings, bonds, and atoms) and forms a junction tree to represent the molecule. The graph encoder and tree encoder generate latent representations of the molecular graph and junction tree, respectively. The tree decoder reconstructs the junction tree from the latent representation, and the graph decoder assembles the subgraphs into a complete molecular graph.
The model is evaluated on multiple tasks, including molecule reconstruction, Bayesian optimization, and constrained molecule optimization. It produces 100% valid molecules when sampled from a prior distribution and outperforms baselines in generating molecules with desired properties. The model also excels in constrained optimization, where it finds molecules with high property values while maintaining similarity to the original molecule.
The JT-VAE approach provides a more efficient and chemically valid method for generating molecular graphs compared to previous approaches that rely on SMILES strings. It enables the automated design of molecules based on specific chemical properties, which is crucial for drug discovery. The model's ability to generate valid molecules while maintaining chemical validity makes it a promising tool for molecular graph generation.