Performance of Large Language Models on Medical Oncology Examination Questions

Performance of Large Language Models on Medical Oncology Examination Questions

2024 | Jack B. Longwell, HBSc; Ian Hirsch, MD, MSc; Fernando Binder, MD, MPH; Galileo Arturo Gonzalez Conchas, MD; Daniel Mau, HBSc; Raymond Jang, MD, MSc; Rahul G. Krishnan, PhD; Robert C. Grant, MD, PhD
This study evaluates the performance of large language models (LLMs) on medical oncology examination questions to assess their accuracy and safety. The study was conducted between May 28 and October 11, 2023, and involved 8 LLMs, including proprietary LLM 2, which outperformed proprietary LLM 1 and open-source models. Proprietary LLM 2 correctly answered 85.0% of the questions, with explanations containing no or minor errors for 93.9% of the questions. However, 81.8% of incorrect answers were rated as having a medium or high likelihood of moderate to severe harm if acted upon in clinical practice. The study highlights the potential of LLMs in improving healthcare clinician experiences and patient care, but also underscores the need for further research to address safety concerns, particularly in dynamic and high-stakes clinical settings like medical oncology.This study evaluates the performance of large language models (LLMs) on medical oncology examination questions to assess their accuracy and safety. The study was conducted between May 28 and October 11, 2023, and involved 8 LLMs, including proprietary LLM 2, which outperformed proprietary LLM 1 and open-source models. Proprietary LLM 2 correctly answered 85.0% of the questions, with explanations containing no or minor errors for 93.9% of the questions. However, 81.8% of incorrect answers were rated as having a medium or high likelihood of moderate to severe harm if acted upon in clinical practice. The study highlights the potential of LLMs in improving healthcare clinician experiences and patient care, but also underscores the need for further research to address safety concerns, particularly in dynamic and high-stakes clinical settings like medical oncology.
Reach us at info@study.space
[slides and audio] Performance of Large Language Models on Medical Oncology Examination Questions