Understanding Traditional Chinese Medicine Knowledge Graph Construction Based on Large Language Models

This study explores the use of large language models (LLMs) in constructing a knowledge graph for Traditional Chinese Medicine (TCM) to improve the representation, storage, and application of TCM knowledge. The knowledge graph, based on a graph structure, effectively organizes entities, attributes, and relationships within the TCM domain. By leveraging LLMs, we collected and embedded substantial TCM-related data, generating precise representations transformed into a knowledge graph format. Experimental evaluations confirmed the accuracy and effectiveness of the constructed graph, extracting various entities and their relationships, providing a solid foundation for TCM learning, research, and application. The knowledge graph has significant potential in TCM, aiding in teaching, disease diagnosis, treatment decisions, and contributing to TCM modernization. The traditional process of constructing knowledge graphs often relies on extensive manual operations and expert knowledge, which can lead to inefficiency and errors when dealing with massive data and complex relationships. Knowledge graph construction typically involves various methods such as manual construction, automatic construction, and semi-automatic construction. Manual construction involves domain experts manually inputting entities, attributes, and relationships, but this method is time-consuming and labor-intensive, with limited applicability. Automatic construction utilizes information extraction techniques to extract knowledge from structured and unstructured data, but may face challenges in terms of accuracy and completeness. Semi-automatic construction combines manual and automatic approaches, guiding experts through construction using assisting tools or algorithms and achieving certain effectiveness. Currently, with the development of large language models (LLMs), their application in knowledge graph construction has gradually become a research hotspot. LLMs possess outstanding representation learning capabilities, extracting rich semantic information from text through learning from extensive corpora. Introducing LLMs into the knowledge graph construction process allows for the automated extraction of entities, attributes, and relationships from text, significantly reducing the manual annotation workload, improving construction efficiency, and ensuring accuracy. Furthermore, LLMs can handle multimodal data, such as text and images, providing support for the richness and diversity of knowledge graphs. The objective of constructing a knowledge graph for TCM is to structurally represent and link entities, relationships, and attributes related to TCM, forming a comprehensive and accurate network of TCM knowledge. Such a knowledge graph can assist healthcare professionals in disease differentiation and treatment, support clinical decision-making, and provide rich data for TCM research. Furthermore, a TCM knowledge graph facilitates the integration of TCM with modern medicine, opening new possibilities for interdisciplinary medical research and applications. Despite several studies focusing on the construction of TCM knowledge graphs, the field still faces numerous challenges. The primary challenges include the complexity, diversity, and ambiguity of TCM knowledge, which increase the difficulty of accurately characterizing and correlating various types of knowledge. Additionally, given the vast and decentralized nature of the TCM knowledge system, effectively collecting, integrating, and storing relevant knowledge poses a challenging task. Furthermore, ensuring the timeliness and updateability of the knowledge graph is crucial as TThis study explores the use of large language models (LLMs) in constructing a knowledge graph for Traditional Chinese Medicine (TCM) to improve the representation, storage, and application of TCM knowledge. The knowledge graph, based on a graph structure, effectively organizes entities, attributes, and relationships within the TCM domain. By leveraging LLMs, we collected and embedded substantial TCM-related data, generating precise representations transformed into a knowledge graph format. Experimental evaluations confirmed the accuracy and effectiveness of the constructed graph, extracting various entities and their relationships, providing a solid foundation for TCM learning, research, and application. The knowledge graph has significant potential in TCM, aiding in teaching, disease diagnosis, treatment decisions, and contributing to TCM modernization. The traditional process of constructing knowledge graphs often relies on extensive manual operations and expert knowledge, which can lead to inefficiency and errors when dealing with massive data and complex relationships. Knowledge graph construction typically involves various methods such as manual construction, automatic construction, and semi-automatic construction. Manual construction involves domain experts manually inputting entities, attributes, and relationships, but this method is time-consuming and labor-intensive, with limited applicability. Automatic construction utilizes information extraction techniques to extract knowledge from structured and unstructured data, but may face challenges in terms of accuracy and completeness. Semi-automatic construction combines manual and automatic approaches, guiding experts through construction using assisting tools or algorithms and achieving certain effectiveness. Currently, with the development of large language models (LLMs), their application in knowledge graph construction has gradually become a research hotspot. LLMs possess outstanding representation learning capabilities, extracting rich semantic information from text through learning from extensive corpora. Introducing LLMs into the knowledge graph construction process allows for the automated extraction of entities, attributes, and relationships from text, significantly reducing the manual annotation workload, improving construction efficiency, and ensuring accuracy. Furthermore, LLMs can handle multimodal data, such as text and images, providing support for the richness and diversity of knowledge graphs. The objective of constructing a knowledge graph for TCM is to structurally represent and link entities, relationships, and attributes related to TCM, forming a comprehensive and accurate network of TCM knowledge. Such a knowledge graph can assist healthcare professionals in disease differentiation and treatment, support clinical decision-making, and provide rich data for TCM research. Furthermore, a TCM knowledge graph facilitates the integration of TCM with modern medicine, opening new possibilities for interdisciplinary medical research and applications. Despite several studies focusing on the construction of TCM knowledge graphs, the field still faces numerous challenges. The primary challenges include the complexity, diversity, and ambiguity of TCM knowledge, which increase the difficulty of accurately characterizing and correlating various types of knowledge. Additionally, given the vast and decentralized nature of the TCM knowledge system, effectively collecting, integrating, and storing relevant knowledge poses a challenging task. Furthermore, ensuring the timeliness and updateability of the knowledge graph is crucial as T

Traditional Chinese Medicine Knowledge Graph Construction Based on Large Language Models

7 April 2024 | Yichong Zhang and Yongtao Hao