18 Jun 2024 | Yuzhe Zhang, Yipeng Zhang, Yidong Gan, Lina Yao, Chen Wang
This paper proposes a novel method called LACR (LLM Assisted Causal Recovery) for causal graph discovery using large language models (LLMs). LACR leverages the knowledge stored in LLMs and additional scientific knowledge from databases and experimental data to deduce causal relationships. The method uses a prompting strategy to extract associational relationships among variables and a mechanism to verify causality. Unlike other LLM-based methods that directly instruct LLMs to perform complex causal reasoning, LACR focuses on low-complexity associational reasoning, making it more efficient and accurate. LACR enhances the knowledge base of LLMs using Retrieval Augmented Generation (RAG) to improve associational reasoning. It aggregates information from related literature to enhance causal recovery accuracy. LACR is data-driven and does not rely on task-specific knowledge for document retrieval or prompt design. It can serve as a causal graph recovery tool for generic tasks. The method is validated on several benchmark datasets, showing superior performance compared to existing methods. LACR is also sensitive to new evidence in the literature and can update causal graphs accordingly. The paper highlights the importance of updating ground truth causal graphs based on recent research findings. LACR's results show conflicts between traditional ground truth and state-of-the-art domain research, emphasizing the need for refining validation datasets. The method addresses technical limitations such as search accuracy, LLM understanding of professional documents, and method complexity. It also discusses practical limitations, including the need for up-to-date validation datasets and access to scientific papers. Overall, LACR provides a structured and systematic approach to inferring causal relationships, leveraging LLMs for efficient and accurate causal graph discovery.This paper proposes a novel method called LACR (LLM Assisted Causal Recovery) for causal graph discovery using large language models (LLMs). LACR leverages the knowledge stored in LLMs and additional scientific knowledge from databases and experimental data to deduce causal relationships. The method uses a prompting strategy to extract associational relationships among variables and a mechanism to verify causality. Unlike other LLM-based methods that directly instruct LLMs to perform complex causal reasoning, LACR focuses on low-complexity associational reasoning, making it more efficient and accurate. LACR enhances the knowledge base of LLMs using Retrieval Augmented Generation (RAG) to improve associational reasoning. It aggregates information from related literature to enhance causal recovery accuracy. LACR is data-driven and does not rely on task-specific knowledge for document retrieval or prompt design. It can serve as a causal graph recovery tool for generic tasks. The method is validated on several benchmark datasets, showing superior performance compared to existing methods. LACR is also sensitive to new evidence in the literature and can update causal graphs accordingly. The paper highlights the importance of updating ground truth causal graphs based on recent research findings. LACR's results show conflicts between traditional ground truth and state-of-the-art domain research, emphasizing the need for refining validation datasets. The method addresses technical limitations such as search accuracy, LLM understanding of professional documents, and method complexity. It also discusses practical limitations, including the need for up-to-date validation datasets and access to scientific papers. Overall, LACR provides a structured and systematic approach to inferring causal relationships, leveraging LLMs for efficient and accurate causal graph discovery.