Towards Conversational Diagnostic AI

Towards Conversational Diagnostic AI

11 Jan 2024 | Tao Tu*, Anil Palepu*, Mike Schaeckermann*, Khaled Saab, Jan Freyberg, Ryutaro Tanno, Amy Wang, Brenna Li, Mohamed Amin, Nenad Tomasev, Shekoofeh Azizi, Karan Singh, Yong Cheng, Le Hou, Albert Webson, Kavita Kulkarni, S. Sara Mahdavi, Christopher Semturs, Juraj Gottweis, Joelle Barral, Katherine Chou, Greg S. Corrado, Yossi Matias, Alan Karthikesalingam†, Vivek Natarajan†
This paper introduces AMIE, a large language model (LLM) based AI system optimized for diagnostic dialogue. AMIE is designed to simulate clinical conversations with automated feedback mechanisms to scale learning across diverse disease conditions, specialties, and contexts. The system was evaluated in a randomized, double-blind crossover study with 149 case scenarios from clinical providers in Canada, the UK, and India, 20 primary care physicians (PCPs) for comparison, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. The study included a self-play based simulated dialogue environment with automated feedback mechanisms to enrich and accelerate learning. AMIE was also evaluated using a pilot evaluation rubric to assess history-taking, diagnostic reasoning, communication skills, and empathy. The results represent a milestone towards conversational diagnostic AI. AMIE was developed using a diverse suite of real-world datasets including multiple-choice medical question-answering, expert-curated long-form medical reasoning, electronic health record (EHR) note summaries, and large-scale transcribed medical conversation interactions. The system was fine-tuned with a combination of real-world and simulated medical dialogues, alongside a diverse set of medical reasoning, question answering, and summarization datasets. AMIE was also evaluated using a chain-of-reasoning strategy to progressively refine its response conditioned on the current conversation to arrive at an accurate and grounded reply. The study found that AMIE outperformed PCPs on multiple evaluation axes for diagnostic dialogue. The results suggest that AMIE has the potential to improve access to diagnostic and prognostic expertise, to improve quality, consistency, availability, and affordability of care, and to help realize better health outcomes. However, the study has limitations, including the use of a text-chat interface which is unfamiliar to PCPs for remote consultation. The results represent a milestone towards conversational diagnostic AI.This paper introduces AMIE, a large language model (LLM) based AI system optimized for diagnostic dialogue. AMIE is designed to simulate clinical conversations with automated feedback mechanisms to scale learning across diverse disease conditions, specialties, and contexts. The system was evaluated in a randomized, double-blind crossover study with 149 case scenarios from clinical providers in Canada, the UK, and India, 20 primary care physicians (PCPs) for comparison, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. The study included a self-play based simulated dialogue environment with automated feedback mechanisms to enrich and accelerate learning. AMIE was also evaluated using a pilot evaluation rubric to assess history-taking, diagnostic reasoning, communication skills, and empathy. The results represent a milestone towards conversational diagnostic AI. AMIE was developed using a diverse suite of real-world datasets including multiple-choice medical question-answering, expert-curated long-form medical reasoning, electronic health record (EHR) note summaries, and large-scale transcribed medical conversation interactions. The system was fine-tuned with a combination of real-world and simulated medical dialogues, alongside a diverse set of medical reasoning, question answering, and summarization datasets. AMIE was also evaluated using a chain-of-reasoning strategy to progressively refine its response conditioned on the current conversation to arrive at an accurate and grounded reply. The study found that AMIE outperformed PCPs on multiple evaluation axes for diagnostic dialogue. The results suggest that AMIE has the potential to improve access to diagnostic and prognostic expertise, to improve quality, consistency, availability, and affordability of care, and to help realize better health outcomes. However, the study has limitations, including the use of a text-chat interface which is unfamiliar to PCPs for remote consultation. The results represent a milestone towards conversational diagnostic AI.
Reach us at info@study.space