MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning

MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning

4 Jun 2024 | Shuyue Stella Li, Vidhisha Balachandran, Shangbin Feng, Jonathan Ilgen, Emma Pierson, Pang Wei Koh & Yulia Tsvetkov
This paper introduces MEDIQ, a framework for simulating realistic clinical interactions to improve the reliability and adaptability of large language models (LLMs) in clinical reasoning. The framework includes a Patient system and an adaptive Expert system. The Patient system provides incomplete information initially, while the Expert system asks follow-up questions to gather necessary information and make informed decisions. The Expert system is evaluated using two medical QA datasets, MEDQA and CRAFT-MD, converted into an interactive setup. The results show that state-of-the-art LLMs struggle to proactively seek information in realistic interactive settings, with performance degrading when asked to ask questions. However, augmenting the Expert system with a confidence estimation module improves diagnostic accuracy by 22.3%. The paper highlights the importance of interactive information-seeking in clinical settings and proposes a novel framework for evaluating LLMs in such scenarios. The MEDIQ framework is designed to simulate realistic clinical consultations, allowing for the evaluation of LLMs in dynamic, information-seeking environments. The paper also discusses the limitations of current LLMs in clinical reasoning and the need for further research to improve their ability to seek information in interactive settings. The results demonstrate that while LLMs perform well with complete information, they struggle in realistic, information-limited scenarios. The paper concludes that MEDIQ provides a valuable benchmark for evaluating LLMs in clinical reasoning and highlights the need for further research to enhance their information-seeking capabilities.This paper introduces MEDIQ, a framework for simulating realistic clinical interactions to improve the reliability and adaptability of large language models (LLMs) in clinical reasoning. The framework includes a Patient system and an adaptive Expert system. The Patient system provides incomplete information initially, while the Expert system asks follow-up questions to gather necessary information and make informed decisions. The Expert system is evaluated using two medical QA datasets, MEDQA and CRAFT-MD, converted into an interactive setup. The results show that state-of-the-art LLMs struggle to proactively seek information in realistic interactive settings, with performance degrading when asked to ask questions. However, augmenting the Expert system with a confidence estimation module improves diagnostic accuracy by 22.3%. The paper highlights the importance of interactive information-seeking in clinical settings and proposes a novel framework for evaluating LLMs in such scenarios. The MEDIQ framework is designed to simulate realistic clinical consultations, allowing for the evaluation of LLMs in dynamic, information-seeking environments. The paper also discusses the limitations of current LLMs in clinical reasoning and the need for further research to improve their ability to seek information in interactive settings. The results demonstrate that while LLMs perform well with complete information, they struggle in realistic, information-limited scenarios. The paper concludes that MEDIQ provides a valuable benchmark for evaluating LLMs in clinical reasoning and highlights the need for further research to enhance their information-seeking capabilities.
Reach us at info@study.space
[slides and audio] MediQ%3A Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning