19 March 2024 | Vera Sorin, Benjamin S. Glicksberg, Yaara Artsi, Yiftach Barash, Eli Konen, Girish N. Nadkarni, Eyal Klang
This systematic review examines the potential of large language models (LLMs) such as ChatGPT in breast cancer management. The study searched MEDLINE for relevant studies published before December 22, 2023, using keywords related to LLMs, GPT, ChatGPT, and breast cancer. Six studies evaluating ChatGPT-3.5 or GPT-4 were included, focusing on clinical notes analysis, guideline-based question-answering, and patient management recommendations. Accuracy varied from 50% to 98%, with higher accuracy in structured tasks like information retrieval. Half of the studies used real patient data, adding practical clinical value. Challenges included inconsistent accuracy, dependency on question formulation, and missing critical clinical information. The review concludes that LLMs have potential in breast cancer care, particularly in textual information extraction and guideline-driven clinical question-answering, but their inconsistent accuracy highlights the need for careful validation and ongoing supervision.This systematic review examines the potential of large language models (LLMs) such as ChatGPT in breast cancer management. The study searched MEDLINE for relevant studies published before December 22, 2023, using keywords related to LLMs, GPT, ChatGPT, and breast cancer. Six studies evaluating ChatGPT-3.5 or GPT-4 were included, focusing on clinical notes analysis, guideline-based question-answering, and patient management recommendations. Accuracy varied from 50% to 98%, with higher accuracy in structured tasks like information retrieval. Half of the studies used real patient data, adding practical clinical value. Challenges included inconsistent accuracy, dependency on question formulation, and missing critical clinical information. The review concludes that LLMs have potential in breast cancer care, particularly in textual information extraction and guideline-driven clinical question-answering, but their inconsistent accuracy highlights the need for careful validation and ongoing supervision.