2024 | Vera Sorin¹,² · Benjamin S. Glicksberg³ · Yaara Artzi⁴ · Yiftach Barash¹,² · Eli Konen¹ · Girish N. Nadkarni³,⁵ · Eyal Klang³,⁵
A systematic review of large language models (LLMs) in breast cancer management found that six studies evaluated the use of ChatGPT-3.5 and GPT-4 in clinical tasks such as information extraction, guideline-based question-answering, and patient management recommendations. Accuracy varied from 50 to 98%, with higher accuracy in structured tasks like information retrieval. Half of the studies used real patient data, adding practical clinical value. Challenges included inconsistent accuracy, prompt dependency, and missing critical clinical information.
LLMs show potential in breast cancer care, particularly in textual information extraction and guideline-driven clinical question-answering. However, their inconsistent accuracy highlights the need for careful validation and ongoing supervision. The review also noted limitations in LLMs, including the generation of false information, potential perpetuation of healthcare disparities, and the inability to trace decision-making processes. These models can be vulnerable to cyber-attacks and may not reflect real-world clinical performance due to training on vast internet data.
While LLMs can assist in routine tasks, they require further development for personalized treatment planning. The review emphasizes the need for standardized assessment methods and ethical considerations in integrating LLMs into healthcare. Despite their potential, current limitations suggest cautious use, with ongoing validation and supervision necessary for their application in breast cancer management.A systematic review of large language models (LLMs) in breast cancer management found that six studies evaluated the use of ChatGPT-3.5 and GPT-4 in clinical tasks such as information extraction, guideline-based question-answering, and patient management recommendations. Accuracy varied from 50 to 98%, with higher accuracy in structured tasks like information retrieval. Half of the studies used real patient data, adding practical clinical value. Challenges included inconsistent accuracy, prompt dependency, and missing critical clinical information.
LLMs show potential in breast cancer care, particularly in textual information extraction and guideline-driven clinical question-answering. However, their inconsistent accuracy highlights the need for careful validation and ongoing supervision. The review also noted limitations in LLMs, including the generation of false information, potential perpetuation of healthcare disparities, and the inability to trace decision-making processes. These models can be vulnerable to cyber-attacks and may not reflect real-world clinical performance due to training on vast internet data.
While LLMs can assist in routine tasks, they require further development for personalized treatment planning. The review emphasizes the need for standardized assessment methods and ethical considerations in integrating LLMs into healthcare. Despite their potential, current limitations suggest cautious use, with ongoing validation and supervision necessary for their application in breast cancer management.