[slides] MediQ%3A Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning

The paper "MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning" addresses the challenge of developing reliable and safe AI assistants for high-stakes domains like clinical reasoning. The authors identify that current large language models (LLMs) are trained to answer any question, even with incomplete context, which can lead to unreliable and unsafe decisions. To address this, they propose MEDIQ, a framework that simulates realistic clinical interactions, incorporating a Patient System and an adaptive Expert System. The Patient System provides incomplete information initially, and the Expert System elicits missing details through follow-up questions to gather necessary and sufficient information for accurate diagnosis. The authors evaluate MEDIQ using the MEDQA and CRAFT-MD datasets, converting them into an interactive setup where the Expert System must ask questions to gather information. They develop a reliable Patient System and prototype several Expert Systems, finding that prompting state-of-the-art LLMs to ask questions degrades clinical reasoning quality. To improve this, they augment the Expert System with a novel abstention module that estimates model confidence and decides whether to ask more questions, improving diagnostic accuracy by 22.3%. However, performance still lags compared to the upper bound when full information is provided upfront. The paper highlights the importance of interactive information-seeking abilities in LLMs for critical domains and provides a modular, interactive benchmark to facilitate the development of reliable LLM assistants. Key contributions include the introduction of the MEDIQ framework, the development of a reliable Patient System, and the demonstration of the gap between current LLMs and realistic clinical scenarios. The authors also discuss future directions, such as improving confidence estimation and integrating external domain knowledge.The paper "MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning" addresses the challenge of developing reliable and safe AI assistants for high-stakes domains like clinical reasoning. The authors identify that current large language models (LLMs) are trained to answer any question, even with incomplete context, which can lead to unreliable and unsafe decisions. To address this, they propose MEDIQ, a framework that simulates realistic clinical interactions, incorporating a Patient System and an adaptive Expert System. The Patient System provides incomplete information initially, and the Expert System elicits missing details through follow-up questions to gather necessary and sufficient information for accurate diagnosis. The authors evaluate MEDIQ using the MEDQA and CRAFT-MD datasets, converting them into an interactive setup where the Expert System must ask questions to gather information. They develop a reliable Patient System and prototype several Expert Systems, finding that prompting state-of-the-art LLMs to ask questions degrades clinical reasoning quality. To improve this, they augment the Expert System with a novel abstention module that estimates model confidence and decides whether to ask more questions, improving diagnostic accuracy by 22.3%. However, performance still lags compared to the upper bound when full information is provided upfront. The paper highlights the importance of interactive information-seeking abilities in LLMs for critical domains and provides a modular, interactive benchmark to facilitate the development of reliable LLM assistants. Key contributions include the introduction of the MEDIQ framework, the development of a reliable Patient System, and the demonstration of the gap between current LLMs and realistic clinical scenarios. The authors also discuss future directions, such as improving confidence estimation and integrating external domain knowledge.

MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning

4 Jun 2024 | Shuyue Stella Li, Vidhisha Balachandran, Shangbin Feng, Jonathan Ilgen, Emma Pierson, Pang Wei Koh, Yulia Tsvetkov