Interpretable Machine Learning Framework to Predict the Glass Transition Temperature of Polymers

Interpretable Machine Learning Framework to Predict the Glass Transition Temperature of Polymers

10 April 2024 | Md. Jamal Uddin, Jitang Fan
This study proposes an interpretable machine learning framework to predict the glass transition temperature (Tg) of polymers. A dataset of 7174 polymer samples was used, with polymer data represented using Morgan fingerprints and molecular descriptors. Data preprocessing involved scaling, removing low-variance features, and using Pearson correlation to eliminate highly correlated features. Recursive feature elimination was then applied to select the most significant features. Nine machine learning models, including decision trees, support vector machines, AdaBoost, K-nearest neighbors, XGBoost, random forests, light gradient boosting, histogram gradient boosting, and extra trees, were employed to predict Tg. Hyperparameters were tuned using grid search, and model performance was evaluated using mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R²). The extra tree regressor achieved the best results, with an R² of 88.01%, MAE of 26.186, and RMSE of 38.839. Statistical methods and SHAP analysis were used to identify significant features, revealing that BalabanJ, SlogP_VSA1, and MaxEStateIndex were the most influential. The framework is adaptable for predicting other polymer properties with low computational cost. The study highlights the importance of feature selection and interpretable models in polymer property prediction, demonstrating the effectiveness of machine learning in accurately forecasting Tg and enhancing polymer design.This study proposes an interpretable machine learning framework to predict the glass transition temperature (Tg) of polymers. A dataset of 7174 polymer samples was used, with polymer data represented using Morgan fingerprints and molecular descriptors. Data preprocessing involved scaling, removing low-variance features, and using Pearson correlation to eliminate highly correlated features. Recursive feature elimination was then applied to select the most significant features. Nine machine learning models, including decision trees, support vector machines, AdaBoost, K-nearest neighbors, XGBoost, random forests, light gradient boosting, histogram gradient boosting, and extra trees, were employed to predict Tg. Hyperparameters were tuned using grid search, and model performance was evaluated using mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R²). The extra tree regressor achieved the best results, with an R² of 88.01%, MAE of 26.186, and RMSE of 38.839. Statistical methods and SHAP analysis were used to identify significant features, revealing that BalabanJ, SlogP_VSA1, and MaxEStateIndex were the most influential. The framework is adaptable for predicting other polymer properties with low computational cost. The study highlights the importance of feature selection and interpretable models in polymer property prediction, demonstrating the effectiveness of machine learning in accurately forecasting Tg and enhancing polymer design.
Reach us at info@study.space
[slides and audio] Interpretable Machine Learning Framework to Predict the Glass Transition Temperature of Polymers