[slides] Extrapolative prediction of small-data molecular property using quantum mechanics-assisted machine learning

This study presents a comprehensive benchmark for evaluating the extrapolative performance of machine learning (ML) and deep learning (DL) models in predicting molecular properties. The benchmark assesses 12 datasets of organic molecular properties, highlighting the challenges of extrapolation, especially with small datasets. Conventional ML models show significant performance degradation when predicting beyond the training distribution of molecular properties and structures. To address this, the authors introduce a quantum-mechanical (QM) descriptor dataset, QMex, and an interactive linear regression (ILR) model that incorporates interaction terms between QM descriptors and molecular structure information. The QMex-based ILR achieves state-of-the-art extrapolative performance while maintaining interpretability. The benchmark results demonstrate that QMex-based models outperform existing models in both property range and molecular structure extrapolation. QMex descriptors, which include 22 QM descriptors for approximately 26,000 molecules, provide enhanced performance compared to existing QM-based models. The QMex-ILR model, which incorporates interaction terms between QM descriptors and chemical categories, shows superior performance in extrapolating molecular properties, particularly for small datasets. The study also highlights the importance of QM-based models in overcoming the limitations of property range and molecular structure within the training data. The results indicate that QM-based models, particularly QMex-ILR, are more suitable for extrapolation tasks compared to structure-based models. The study emphasizes the robustness of QMex-based models in handling structure bias and their ability to predict molecular properties accurately even with limited experimental data. The proposed QMex-ILR model is shown to be effective in improving extrapolative performance, making it a valuable tool for materials discovery and design. The study also discusses the potential of QMex-based models in predicting a wide range of molecular properties and their applicability to larger material systems. Overall, the study provides insights into the effectiveness of QM-based models in extrapolative prediction and highlights the importance of QMex descriptors in enhancing the accuracy and reliability of molecular property predictions.This study presents a comprehensive benchmark for evaluating the extrapolative performance of machine learning (ML) and deep learning (DL) models in predicting molecular properties. The benchmark assesses 12 datasets of organic molecular properties, highlighting the challenges of extrapolation, especially with small datasets. Conventional ML models show significant performance degradation when predicting beyond the training distribution of molecular properties and structures. To address this, the authors introduce a quantum-mechanical (QM) descriptor dataset, QMex, and an interactive linear regression (ILR) model that incorporates interaction terms between QM descriptors and molecular structure information. The QMex-based ILR achieves state-of-the-art extrapolative performance while maintaining interpretability. The benchmark results demonstrate that QMex-based models outperform existing models in both property range and molecular structure extrapolation. QMex descriptors, which include 22 QM descriptors for approximately 26,000 molecules, provide enhanced performance compared to existing QM-based models. The QMex-ILR model, which incorporates interaction terms between QM descriptors and chemical categories, shows superior performance in extrapolating molecular properties, particularly for small datasets. The study also highlights the importance of QM-based models in overcoming the limitations of property range and molecular structure within the training data. The results indicate that QM-based models, particularly QMex-ILR, are more suitable for extrapolation tasks compared to structure-based models. The study emphasizes the robustness of QMex-based models in handling structure bias and their ability to predict molecular properties accurately even with limited experimental data. The proposed QMex-ILR model is shown to be effective in improving extrapolative performance, making it a valuable tool for materials discovery and design. The study also discusses the potential of QMex-based models in predicting a wide range of molecular properties and their applicability to larger material systems. Overall, the study provides insights into the effectiveness of QM-based models in extrapolative prediction and highlights the importance of QMex descriptors in enhancing the accuracy and reliability of molecular property predictions.

Extrapolative prediction of small-data molecular property using quantum mechanics-assisted machine learning

2024 | Hajime Shimakawa, Akiko Kumada and Masahiro Sato