2024 | Joel Hake, MD, Miles Crowley, MD, MPH, Allison Coy, MD, Denton Shanks, DO, MPH, Aundria Eoff, MD, Kalee Kirner-Voss, MD, Gurpreet Dhanda, MD, Daniel J. Parente, MD, PhD
The study evaluates the quality, accuracy, and bias of ChatGPT summaries of 140 peer-reviewed medical abstracts from 14 journals. Physicians rated the summaries as high in quality (median 90), accuracy (median 92.5), and low in bias (median 0). ChatGPT produced summaries that were 70% shorter than the original abstracts. While serious inaccuracies and hallucinations were rare, ChatGPT's ability to classify the relevance of articles to specific medical specialties was modest. The study concludes that ChatGPT can be a useful tool for busy clinicians to quickly screen and prioritize research articles, but it should not replace critical evaluation of full texts for life-critical medical decisions. The authors also developed software (pyJournalWatch) to support this application.The study evaluates the quality, accuracy, and bias of ChatGPT summaries of 140 peer-reviewed medical abstracts from 14 journals. Physicians rated the summaries as high in quality (median 90), accuracy (median 92.5), and low in bias (median 0). ChatGPT produced summaries that were 70% shorter than the original abstracts. While serious inaccuracies and hallucinations were rare, ChatGPT's ability to classify the relevance of articles to specific medical specialties was modest. The study concludes that ChatGPT can be a useful tool for busy clinicians to quickly screen and prioritize research articles, but it should not replace critical evaluation of full texts for life-critical medical decisions. The authors also developed software (pyJournalWatch) to support this application.