Integrating Chemistry Knowledge in Large Language Models via Prompt Engineering

Integrating Chemistry Knowledge in Large Language Models via Prompt Engineering

22 Apr 2024 | Hongxuan Liu, Haoyu Yin, Zhiyao Luo, Xiaonan Wang
This paper presents a study on integrating domain-specific knowledge into prompt engineering to enhance the performance of large language models (LLMs) in scientific domains. A benchmark dataset is created to capture the physical-chemical properties of small molecules, their drugability, and the functional attributes of enzymes and crystal materials. The proposed domain-knowledge embedded prompt engineering method outperforms traditional prompt engineering strategies on metrics such as capability, accuracy, F1 score, and hallucination drop. The effectiveness of the method is demonstrated through case studies on complex materials like the MacMillan catalyst, paclitaxel, and lithium cobalt oxide. The results show that domain-knowledge prompts can guide LLMs to generate more accurate and relevant responses, highlighting the potential of LLMs as powerful tools for scientific discovery and innovation when equipped with domain-specific prompts. The study also discusses limitations and future directions for domain-specific prompt engineering development. The paper introduces "domain-knowledge embedded prompt engineering" as a novel approach to enhance LLM performance in specialized areas. It first constructs domain-specific datasets for small molecules, enzymes, and crystal materials. Then, it develops and tests specific prompts for various tasks in chemistry, materials science, and biology. The method combines general computer science techniques for comparison, validating its effectiveness. The approach aligns with desired outcomes and involves developing appropriate evaluation metrics. It also addresses the issue of LLMs generating inaccurate or 'hallucinated' responses and designs strategies to mitigate this. Through case studies, it demonstrates how the prompting strategies can address specific challenges in these fields. Overall, it shows that domain-knowledge embedded prompt engineering offers a cost-effective and efficient way to leverage the potential of LLMs. The paper evaluates the performance of LLM prompt engineering methods on different tasks using four metrics: capability, accuracy, F1 score, and hallucination drop. The domain-knowledge embedded prompt engineering method outperforms traditional methods on most tasks and metrics. It performs significantly better on tasks involving small molecules and crystal materials, and more than 50% of enzyme-related tasks. The method effectively reduces hallucination, with the highest performance in tasks involving experimental data. It also shows that LLMs perform better on verbal tasks compared to numerical tasks, and that domain-knowledge embedded prompts significantly improve performance on tasks requiring logical reasoning. The paper compares different prompt engineering methods under various CoT complexities and material types. It shows that domain-knowledge embedded prompt engineering produces the greatest performance lift in tasks with the most complicated CoT formulations. It also demonstrates that providing additional in-context information can effectively reduce hallucination levels. The paper also compares the performance of LLMs on different materials, showing that prediction accuracy decreases for larger and more complex materials. It highlights the correlation between unit cell symmetry and prediction accuracy for crystalline materials. The paper presents three case studies on the MacMillan catalyst, paclitaxel, and lithium cobalt oxide, demonstrating the effectiveness of domain-knowledge embeddedThis paper presents a study on integrating domain-specific knowledge into prompt engineering to enhance the performance of large language models (LLMs) in scientific domains. A benchmark dataset is created to capture the physical-chemical properties of small molecules, their drugability, and the functional attributes of enzymes and crystal materials. The proposed domain-knowledge embedded prompt engineering method outperforms traditional prompt engineering strategies on metrics such as capability, accuracy, F1 score, and hallucination drop. The effectiveness of the method is demonstrated through case studies on complex materials like the MacMillan catalyst, paclitaxel, and lithium cobalt oxide. The results show that domain-knowledge prompts can guide LLMs to generate more accurate and relevant responses, highlighting the potential of LLMs as powerful tools for scientific discovery and innovation when equipped with domain-specific prompts. The study also discusses limitations and future directions for domain-specific prompt engineering development. The paper introduces "domain-knowledge embedded prompt engineering" as a novel approach to enhance LLM performance in specialized areas. It first constructs domain-specific datasets for small molecules, enzymes, and crystal materials. Then, it develops and tests specific prompts for various tasks in chemistry, materials science, and biology. The method combines general computer science techniques for comparison, validating its effectiveness. The approach aligns with desired outcomes and involves developing appropriate evaluation metrics. It also addresses the issue of LLMs generating inaccurate or 'hallucinated' responses and designs strategies to mitigate this. Through case studies, it demonstrates how the prompting strategies can address specific challenges in these fields. Overall, it shows that domain-knowledge embedded prompt engineering offers a cost-effective and efficient way to leverage the potential of LLMs. The paper evaluates the performance of LLM prompt engineering methods on different tasks using four metrics: capability, accuracy, F1 score, and hallucination drop. The domain-knowledge embedded prompt engineering method outperforms traditional methods on most tasks and metrics. It performs significantly better on tasks involving small molecules and crystal materials, and more than 50% of enzyme-related tasks. The method effectively reduces hallucination, with the highest performance in tasks involving experimental data. It also shows that LLMs perform better on verbal tasks compared to numerical tasks, and that domain-knowledge embedded prompts significantly improve performance on tasks requiring logical reasoning. The paper compares different prompt engineering methods under various CoT complexities and material types. It shows that domain-knowledge embedded prompt engineering produces the greatest performance lift in tasks with the most complicated CoT formulations. It also demonstrates that providing additional in-context information can effectively reduce hallucination levels. The paper also compares the performance of LLMs on different materials, showing that prediction accuracy decreases for larger and more complex materials. It highlights the correlation between unit cell symmetry and prediction accuracy for crystalline materials. The paper presents three case studies on the MacMillan catalyst, paclitaxel, and lithium cobalt oxide, demonstrating the effectiveness of domain-knowledge embedded
Reach us at info@study.space
Understanding Integrating chemistry knowledge in large language models via prompt engineering