25 Apr 2024 | Yanzeng Li, Cheng Zeng, Jialun Zhong, Ruoyu Zhang, Minhao Zhang, Lei Zou
This paper presents CureFun, an integrated model-agnostic framework that leverages large language models (LLMs) to simulate patients for clinical education. The framework enables natural conversations between students and simulated patients, evaluates their dialogue, and provides suggestions to enhance clinical inquiry skills. It addresses the limitations of traditional simulated patients (SPs), such as high costs and physical/psychological risks, by using LLMs to create more authentic and professional SP-scenario dialogue flows. CureFun also assesses several medical LLMs and discusses their potential and limitations as virtual doctors.
The framework includes a graph-driven context-adaptive SP chatbot that dynamically adjusts dialogue flow using a structured graph memory. It also features an LLM-based automatic assessment system that evaluates students' medical dialogues through a structured checklist and provides scores and suggestions. The system uses a combination of LLMs to ensure accurate and reliable assessments.
The study evaluates the performance of various LLMs in acting as SPs and as virtual doctors (VDs). Results show that CureFun significantly improves the performance of LLMs in simulating patients, particularly in maintaining dialogue consistency and providing accurate responses. The framework also demonstrates high correlation between automated assessments and human evaluations, indicating its reliability.
The study highlights the potential of LLMs as VSPs for more efficient clinical education and provides insights into the development of medical LLMs for intelligent diagnosis and treatment. It also addresses challenges such as hallucinations, instruction leakage, and toxic responses in conversation AI, and proposes solutions to enhance the performance and reliability of VSPs. The framework is designed to be model-agnostic, allowing it to accommodate a wide range of chat-oriented LLMs.This paper presents CureFun, an integrated model-agnostic framework that leverages large language models (LLMs) to simulate patients for clinical education. The framework enables natural conversations between students and simulated patients, evaluates their dialogue, and provides suggestions to enhance clinical inquiry skills. It addresses the limitations of traditional simulated patients (SPs), such as high costs and physical/psychological risks, by using LLMs to create more authentic and professional SP-scenario dialogue flows. CureFun also assesses several medical LLMs and discusses their potential and limitations as virtual doctors.
The framework includes a graph-driven context-adaptive SP chatbot that dynamically adjusts dialogue flow using a structured graph memory. It also features an LLM-based automatic assessment system that evaluates students' medical dialogues through a structured checklist and provides scores and suggestions. The system uses a combination of LLMs to ensure accurate and reliable assessments.
The study evaluates the performance of various LLMs in acting as SPs and as virtual doctors (VDs). Results show that CureFun significantly improves the performance of LLMs in simulating patients, particularly in maintaining dialogue consistency and providing accurate responses. The framework also demonstrates high correlation between automated assessments and human evaluations, indicating its reliability.
The study highlights the potential of LLMs as VSPs for more efficient clinical education and provides insights into the development of medical LLMs for intelligent diagnosis and treatment. It also addresses challenges such as hallucinations, instruction leakage, and toxic responses in conversation AI, and proposes solutions to enhance the performance and reliability of VSPs. The framework is designed to be model-agnostic, allowing it to accommodate a wide range of chat-oriented LLMs.