[slides and audio] Evaluating large language models as agents in the clinic

The article discusses the potential and challenges of using large language models (LLMs) as intelligent agents in clinical settings. LLMs, such as ChatGPT and Med-PaLM 2, have shown significant capabilities in tasks like medical examination questions and clinical decision support, often performing at human expert levels. These models can be developed to interact with clinical tools and databases, perform multi-step reasoning, and collaborate with other agents. The authors propose the development of "Artificial Intelligence Structured Clinical Examinations" (AI-SCE) to evaluate these agents in high-fidelity simulations of clinical workflows. These evaluations should capture the agents' reasoning processes, tool usage, and interactions with users, and should be conducted with interdisciplinary teams. The article emphasizes the need for robust clinical guidelines and human evaluators to ensure the safety and effectiveness of LLM agents in healthcare.The article discusses the potential and challenges of using large language models (LLMs) as intelligent agents in clinical settings. LLMs, such as ChatGPT and Med-PaLM 2, have shown significant capabilities in tasks like medical examination questions and clinical decision support, often performing at human expert levels. These models can be developed to interact with clinical tools and databases, perform multi-step reasoning, and collaborate with other agents. The authors propose the development of "Artificial Intelligence Structured Clinical Examinations" (AI-SCE) to evaluate these agents in high-fidelity simulations of clinical workflows. These evaluations should capture the agents' reasoning processes, tool usage, and interactions with users, and should be conducted with interdisciplinary teams. The article emphasizes the need for robust clinical guidelines and human evaluators to ensure the safety and effectiveness of LLM agents in healthcare.

Evaluating large language models as agents in the clinic

2024 | Nikita Mehandru, Brenda Y. Miao, Eduardo Rodriguez Almaraz, Madhumita Sushil, Atul J. Butte & Ahmed Alaa