Property-guided generation of complex polymer topologies using variational autoencoders

Property-guided generation of complex polymer topologies using variational autoencoders

2024 | Shengli Jiang, Adji Boussou Dieng & Michael A. Webb
This study introduces a variational autoencoder (VAE) approach to generate complex polymer topologies with desired properties. The method leverages a dataset of 1342 polymers with various architectures, including linear, cyclic, branch, comb, star, and dendrimer structures, and employs a multi-task learning framework to reconstruct and classify polymer topologies while predicting their dilute-solution radii of gyration. The framework enables the generation of polymer topologies with target size, validated through molecular simulations. These capabilities are used to contrast rheological properties of topologically distinct polymers with similar dilute-solution behavior. The topology of a polymer chain significantly influences its properties and those of derivative materials. For example, in natural polymers, linear amylose forms dense aggregates with low aqueous solubility, while the highly branched structure of amylopectin enhances its solubility. In synthetic polymers, branching in low-density polyethylene improves processability, whereas linear high-density polyethylene has superior mechanical strength and chemical resistance. Advances in synthetic methodologies have enabled the creation of polymers with complex topologies, such as stars, combs, branches, hyperbranches, dendrimers, rings, and brushes. Establishing quantitative relationships between polymer topology and material properties remains challenging. Both experimental and computational investigations have enhanced understanding of how polymer topology influences properties of interest, such as enhanced oil recovery, coatings and adhesives, rheology and fluid dynamics, energy storage, and biomedical applications. However, labor-intensive and costly synthesis and characterization limit experimental studies to a small set of systems, while computational methods are often restricted to specific topologies due to computational costs and uncertainty in comparing diverse topologies. Recent advancements in machine learning have spurred developments in polymer design. Generative machine learning models, such as variational autoencoders (VAEs), are particularly intriguing for chemical design. VAEs are adept at encoding complex data into lower-dimensional latent spaces and have been used to generate small molecules and explore polymer science. In this study, a multi-task VAE is developed to generate polymers with specified topology and desired characteristics. The model is trained on a dataset of coarse-grained molecular dynamics (MD) data for over 1300 polymers of various topologies. Input and encoding strategies are critically assessed by training several models to reconstruct polymer topology and perform auxiliary tasks of estimating the characteristic size and classifying topology. Auxiliary tasks enhance the physical interpretability of the learned latent space. The most effective generative modeling framework, TopoGNN, incorporates both graph and topological descriptor features. It is used to produce sets of topologically diverse polymers with the same characteristic size in dilute solution but contrasting rheological behavior. This work expands the utility of generative modeling for polymer design and demonstrates how such algorithms can facilitate controlled studies across complex, topologically diverse polymers. The study also highlights the effectiveness of the workflow for TopoGNN to produceThis study introduces a variational autoencoder (VAE) approach to generate complex polymer topologies with desired properties. The method leverages a dataset of 1342 polymers with various architectures, including linear, cyclic, branch, comb, star, and dendrimer structures, and employs a multi-task learning framework to reconstruct and classify polymer topologies while predicting their dilute-solution radii of gyration. The framework enables the generation of polymer topologies with target size, validated through molecular simulations. These capabilities are used to contrast rheological properties of topologically distinct polymers with similar dilute-solution behavior. The topology of a polymer chain significantly influences its properties and those of derivative materials. For example, in natural polymers, linear amylose forms dense aggregates with low aqueous solubility, while the highly branched structure of amylopectin enhances its solubility. In synthetic polymers, branching in low-density polyethylene improves processability, whereas linear high-density polyethylene has superior mechanical strength and chemical resistance. Advances in synthetic methodologies have enabled the creation of polymers with complex topologies, such as stars, combs, branches, hyperbranches, dendrimers, rings, and brushes. Establishing quantitative relationships between polymer topology and material properties remains challenging. Both experimental and computational investigations have enhanced understanding of how polymer topology influences properties of interest, such as enhanced oil recovery, coatings and adhesives, rheology and fluid dynamics, energy storage, and biomedical applications. However, labor-intensive and costly synthesis and characterization limit experimental studies to a small set of systems, while computational methods are often restricted to specific topologies due to computational costs and uncertainty in comparing diverse topologies. Recent advancements in machine learning have spurred developments in polymer design. Generative machine learning models, such as variational autoencoders (VAEs), are particularly intriguing for chemical design. VAEs are adept at encoding complex data into lower-dimensional latent spaces and have been used to generate small molecules and explore polymer science. In this study, a multi-task VAE is developed to generate polymers with specified topology and desired characteristics. The model is trained on a dataset of coarse-grained molecular dynamics (MD) data for over 1300 polymers of various topologies. Input and encoding strategies are critically assessed by training several models to reconstruct polymer topology and perform auxiliary tasks of estimating the characteristic size and classifying topology. Auxiliary tasks enhance the physical interpretability of the learned latent space. The most effective generative modeling framework, TopoGNN, incorporates both graph and topological descriptor features. It is used to produce sets of topologically diverse polymers with the same characteristic size in dilute solution but contrasting rheological behavior. This work expands the utility of generative modeling for polymer design and demonstrates how such algorithms can facilitate controlled studies across complex, topologically diverse polymers. The study also highlights the effectiveness of the workflow for TopoGNN to produce
Reach us at info@futurestudyspace.com
[slides and audio] Property-guided generation of complex polymer topologies using variational autoencoders