Understanding Interpretable Machine Learning Framework to Predict the Glass Transition Temperature of Polymers

This study presents an interpretable machine learning framework to predict the glass transition temperature (Tg) of polymers. The research utilized a dataset of 7174 samples, where polymers were represented using Morgan fingerprints and molecular descriptors. The dataset was preprocessed by scaling, removing features with low variance, and using Pearson correlation to exclude highly connected features. The most significant features were selected using recursive feature elimination. Nine machine learning techniques—decision trees, support vector machines, AdaBoost, K-nearest neighbors, extreme gradient boosting, random forests, light gradient boosting, histogram gradient boosting, and extra tree—were employed to predict Tg, with hyperparameters tuned for each model. The extra tree regressor showed the best performance, with an R² of 88.01%, MAE of 26.186, and RMSE of 38.839. Statistical methods and SHAP were used to identify the most influential features, which included BalabanJ, SlogP_VSA1, and MaxEStateIndex. The study demonstrates the effectiveness of machine learning in predicting Tg and provides insights into the importance of specific features, contributing to the design of polymers with desired properties.This study presents an interpretable machine learning framework to predict the glass transition temperature (Tg) of polymers. The research utilized a dataset of 7174 samples, where polymers were represented using Morgan fingerprints and molecular descriptors. The dataset was preprocessed by scaling, removing features with low variance, and using Pearson correlation to exclude highly connected features. The most significant features were selected using recursive feature elimination. Nine machine learning techniques—decision trees, support vector machines, AdaBoost, K-nearest neighbors, extreme gradient boosting, random forests, light gradient boosting, histogram gradient boosting, and extra tree—were employed to predict Tg, with hyperparameters tuned for each model. The extra tree regressor showed the best performance, with an R² of 88.01%, MAE of 26.186, and RMSE of 38.839. Statistical methods and SHAP were used to identify the most influential features, which included BalabanJ, SlogP_VSA1, and MaxEStateIndex. The study demonstrates the effectiveness of machine learning in predicting Tg and provides insights into the importance of specific features, contributing to the design of polymers with desired properties.

Interpretable Machine Learning Framework to Predict the Glass Transition Temperature of Polymers

10 April 2024 | Md. Jamal Uddin and Jitang Fan