Understanding Multimodal Language and Graph Learning of Adsorption Configuration in Catalysis

This study addresses the challenge of accurately predicting adsorption energy in catalysis, a critical reactivity descriptor for effective machine learning (ML) applications in catalyst screening. Traditional methods rely heavily on atomic spatial coordinates, which can be difficult to obtain precisely. To overcome this, the authors introduce a self-supervised multi-modal learning approach called graph-assisted pretraining, which integrates graph neural networks (GNNs) with transformer-based language models. This method reduces the mean absolute error (MAE) of energy prediction for adsorption configurations by about 10%. The graph-assisted pretraining method enhances the fine-tuning process with different datasets, demonstrating its transferability. It also redirects the model's attention towards adsorption configurations rather than individual adsorbate and catalyst information, aligning with common domain knowledge. Additionally, the authors propose using generative large language models to create text inputs for the predictive model based solely on chemical composition and surface orientation, without exact atomic positions. This approach shows potential in energy prediction without geometric information. The study uses the CatBERTa model, which processes textual data, and the EquiformerV2 model as a graph encoder. The pretraining framework combines graph and text embeddings in a shared latent space, enhancing the model's accuracy and adaptability. The results show a significant reduction in MAE and improved prediction accuracy across different datasets. Furthermore, the method demonstrates the potential of using language models for energy prediction in computational catalysis, even without exact atomic structures.This study addresses the challenge of accurately predicting adsorption energy in catalysis, a critical reactivity descriptor for effective machine learning (ML) applications in catalyst screening. Traditional methods rely heavily on atomic spatial coordinates, which can be difficult to obtain precisely. To overcome this, the authors introduce a self-supervised multi-modal learning approach called graph-assisted pretraining, which integrates graph neural networks (GNNs) with transformer-based language models. This method reduces the mean absolute error (MAE) of energy prediction for adsorption configurations by about 10%. The graph-assisted pretraining method enhances the fine-tuning process with different datasets, demonstrating its transferability. It also redirects the model's attention towards adsorption configurations rather than individual adsorbate and catalyst information, aligning with common domain knowledge. Additionally, the authors propose using generative large language models to create text inputs for the predictive model based solely on chemical composition and surface orientation, without exact atomic positions. This approach shows potential in energy prediction without geometric information. The study uses the CatBERTa model, which processes textual data, and the EquiformerV2 model as a graph encoder. The pretraining framework combines graph and text embeddings in a shared latent space, enhancing the model's accuracy and adaptability. The results show a significant reduction in MAE and improved prediction accuracy across different datasets. Furthermore, the method demonstrates the potential of using language models for energy prediction in computational catalysis, even without exact atomic structures.

Multimodal Language and Graph Learning of Adsorption Configuration in Catalysis

8 Aug 2024 | Janghoon Ock, Srivathsan Badrinarayanan, Rishikesh Magar, Akshay Antony, and Amir Barati Farimani