18 Jul 2024 | Junying Chen, Chi Gui, Anningzhe Gao, Ke Ji, Xidong Wang, Xiang Wan, Benyou Wang
This paper introduces Chain-of-Diagnosis (CoD), an interpretable medical agent that enhances the transparency and controllability of large language models (LLMs) in medical diagnosis. CoD transforms the diagnostic process into a diagnostic chain that mirrors a physician's thought process, providing a transparent reasoning pathway. It outputs a disease confidence distribution to ensure transparency in decision-making and allows for control over the LLM's decisions using a confidence threshold. Additionally, diagnostic uncertainty can be quantified by the entropy of the confidences, and entropy reduction can aid in eliciting more effective symptoms for inquiry.
The CoD framework includes a pipeline chain that enables decomposability and algorithmic transparency. Decomposability allows each step of the diagnostic process to be individually interpreted, while algorithmic transparency ensures the model's learning algorithm is understood. CoD also provides post-hoc explanations, elucidating the diagnostic thinking process and supporting clinical decisions.
To implement CoD, the authors propose a method for constructing CoD training data from patient cases. Due to privacy concerns, they generate synthetic cases from disease encyclopedias, enabling scalable training without patient privacy issues. This approach leads to the development of DiagnosisGPT, an LLM capable of diagnosing 9,604 diseases. Experimental results show that DiagnosisGPT outperforms other LLMs on diagnostic benchmarks and achieves over 90% accuracy on all datasets with a diagnostic threshold of 0.55.
The authors also present DxBench, a diagnostic benchmark with 1,148 real cases covering 461 diseases, manually verified and derived from public doctor-patient dialogues. DiagnosisGPT demonstrates superior performance across various diagnostic datasets, achieving the highest accuracy improvement with symptom inquiries. The model's confidence levels are reliable, and its diagnostic capabilities are effective in both open-ended and structured consultation scenarios.
The study highlights the importance of interpretability in medical diagnosis and presents a novel solution for medical diagnosis using CoD. The data, models, and methods from this work can advance the field of medical LLMs.This paper introduces Chain-of-Diagnosis (CoD), an interpretable medical agent that enhances the transparency and controllability of large language models (LLMs) in medical diagnosis. CoD transforms the diagnostic process into a diagnostic chain that mirrors a physician's thought process, providing a transparent reasoning pathway. It outputs a disease confidence distribution to ensure transparency in decision-making and allows for control over the LLM's decisions using a confidence threshold. Additionally, diagnostic uncertainty can be quantified by the entropy of the confidences, and entropy reduction can aid in eliciting more effective symptoms for inquiry.
The CoD framework includes a pipeline chain that enables decomposability and algorithmic transparency. Decomposability allows each step of the diagnostic process to be individually interpreted, while algorithmic transparency ensures the model's learning algorithm is understood. CoD also provides post-hoc explanations, elucidating the diagnostic thinking process and supporting clinical decisions.
To implement CoD, the authors propose a method for constructing CoD training data from patient cases. Due to privacy concerns, they generate synthetic cases from disease encyclopedias, enabling scalable training without patient privacy issues. This approach leads to the development of DiagnosisGPT, an LLM capable of diagnosing 9,604 diseases. Experimental results show that DiagnosisGPT outperforms other LLMs on diagnostic benchmarks and achieves over 90% accuracy on all datasets with a diagnostic threshold of 0.55.
The authors also present DxBench, a diagnostic benchmark with 1,148 real cases covering 461 diseases, manually verified and derived from public doctor-patient dialogues. DiagnosisGPT demonstrates superior performance across various diagnostic datasets, achieving the highest accuracy improvement with symptom inquiries. The model's confidence levels are reliable, and its diagnostic capabilities are effective in both open-ended and structured consultation scenarios.
The study highlights the importance of interpretability in medical diagnosis and presents a novel solution for medical diagnosis using CoD. The data, models, and methods from this work can advance the field of medical LLMs.