Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

2024 | Nate Gruver, Anuroop Sriram, Andrea Madotto, Andrew Gordon Wilson, C. Lawrence Zitnick, Zachary Ulissi
Large language models (LLMs) are fine-tuned to generate stable inorganic materials by encoding crystal structures as text. This approach leverages the ability of LLMs to learn patterns from text data, enabling them to generate structures that obey physical constraints. The method involves encoding crystal structures as strings and using parameter-efficient fine-tuning on a base LLM (LLaMA-2) with a multitask curriculum and translation augmentations. The results show that the fine-tuned LLaMA-2 70B model generates materials predicted to be metastable at a higher rate than a competing diffusion model (CD-VAE). The models can also be used for unconditional generation, text-conditional generation, and structural infilling. The ability of LLMs to capture key symmetries of crystal structures improves with model scale, suggesting that the biases of pretrained LLMs are well-suited for atomistic data. The study compares the performance of LLMs with other generative models, showing that LLMs can generate stable materials with high validity and stability. The results also demonstrate that LLMs can be used for text-conditional generation and infilling, which can be used to optimize the properties of existing materials. The study highlights the potential of LLMs for materials generation and their ability to capture key properties of materials. The results show that LLMs can generate materials with high validity and stability, and that their performance improves with model scale. The study also discusses the limitations of the approach, including the potential for hallucination and the need for further research to improve the performance of LLMs in materials generation.Large language models (LLMs) are fine-tuned to generate stable inorganic materials by encoding crystal structures as text. This approach leverages the ability of LLMs to learn patterns from text data, enabling them to generate structures that obey physical constraints. The method involves encoding crystal structures as strings and using parameter-efficient fine-tuning on a base LLM (LLaMA-2) with a multitask curriculum and translation augmentations. The results show that the fine-tuned LLaMA-2 70B model generates materials predicted to be metastable at a higher rate than a competing diffusion model (CD-VAE). The models can also be used for unconditional generation, text-conditional generation, and structural infilling. The ability of LLMs to capture key symmetries of crystal structures improves with model scale, suggesting that the biases of pretrained LLMs are well-suited for atomistic data. The study compares the performance of LLMs with other generative models, showing that LLMs can generate stable materials with high validity and stability. The results also demonstrate that LLMs can be used for text-conditional generation and infilling, which can be used to optimize the properties of existing materials. The study highlights the potential of LLMs for materials generation and their ability to capture key properties of materials. The results show that LLMs can generate materials with high validity and stability, and that their performance improves with model scale. The study also discusses the limitations of the approach, including the potential for hallucination and the need for further research to improve the performance of LLMs in materials generation.
Reach us at info@study.space
[slides] Fine-Tuned Language Models Generate Stable Inorganic Materials as Text | StudySpace