8 Aug 2024 | Janghoon Ock, Srivathsan Badrinarayanan, Rishikesh Magar, Akshay Antony, and Amir Barati Farimani
This paper introduces a multimodal learning approach called graph-assisted pretraining to improve the prediction accuracy of language models for adsorption configurations in catalysis. The method combines graph neural networks (GNNs) with transformer-based language models to enhance the model's ability to predict adsorption energy, which is a key reactivity descriptor in catalyst screening. The approach reduces the mean absolute error (MAE) of energy prediction by about 10% and demonstrates the transferability of the method across different datasets. The study also explores the use of large language models (LLMs) to generate text inputs for predictive models based on chemical composition and surface orientation, without relying on exact atomic positions. This method allows for energy predictions without knowing the full structure of the adsorbate-catalyst configurations. The framework uses the CatBERTa model for text processing and the EquiformerV2 model for graph encoding. The model is pre-trained in a self-supervised manner to align graph and text embeddings, and then fine-tuned using energy labels derived from DFT calculations. The results show that the graph-assisted pretraining method significantly improves prediction accuracy and adaptability across different datasets and tasks. The study also demonstrates the potential of using LLMs to generate textual input data for energy prediction without precise atomic structures. The method is validated using the OC20 and OC20-Dense datasets, which contain over 1.2 million DFT relaxations and 995 distinct adsorbate-catalyst pairs. The results show that the model can accurately predict the energy of adsorption configurations, even when the exact structures are not known. The study highlights the importance of multimodal learning in catalysis and demonstrates the potential of using language models for energy prediction without geometric information.This paper introduces a multimodal learning approach called graph-assisted pretraining to improve the prediction accuracy of language models for adsorption configurations in catalysis. The method combines graph neural networks (GNNs) with transformer-based language models to enhance the model's ability to predict adsorption energy, which is a key reactivity descriptor in catalyst screening. The approach reduces the mean absolute error (MAE) of energy prediction by about 10% and demonstrates the transferability of the method across different datasets. The study also explores the use of large language models (LLMs) to generate text inputs for predictive models based on chemical composition and surface orientation, without relying on exact atomic positions. This method allows for energy predictions without knowing the full structure of the adsorbate-catalyst configurations. The framework uses the CatBERTa model for text processing and the EquiformerV2 model for graph encoding. The model is pre-trained in a self-supervised manner to align graph and text embeddings, and then fine-tuned using energy labels derived from DFT calculations. The results show that the graph-assisted pretraining method significantly improves prediction accuracy and adaptability across different datasets and tasks. The study also demonstrates the potential of using LLMs to generate textual input data for energy prediction without precise atomic structures. The method is validated using the OC20 and OC20-Dense datasets, which contain over 1.2 million DFT relaxations and 995 distinct adsorbate-catalyst pairs. The results show that the model can accurately predict the energy of adsorption configurations, even when the exact structures are not known. The study highlights the importance of multimodal learning in catalysis and demonstrates the potential of using language models for energy prediction without geometric information.