2024 June | Sarek A Shen, MD, MS; Carlos A Perez-Heydrich, BS; Deborah X. Xie, MD; Jason Nellis, MD
This study compares the readability, understandability, and accuracy of responses from ChatGPT and traditional web searches for patient questions related to otolaryngology. The researchers evaluated 54 questions categorized into three groups: Fact, Policy, and Diagnosis and Recommendations. ChatGPT responses had lower readability (Flesch Reading Ease: 42.3 ± 13.1 vs. 55.6 ± 10.5 for web search, p<0.001) but similar understandability (93.8% vs. 93.5%, p=0.17). ChatGPT performed better in Diagnosis and Recommendations questions (p<0.01), while there was no difference in Fact or Policy questions (p=0.15 and p=0.22, respectively). Adding a prompt to respond at a 6th-grade level improved ChatGPT readability (FRE: 55.6 ± 13.6, p<0.01).
Accuracy scores were higher for ChatGPT (mean 2.87 ± 0.34) compared to web search (2.61 ± 0.63, p=0.02). Both models showed high accuracy in Diagnosis and Recommendations questions. Accuracy was equivalent for Fact and Policy questions. When prompted to respond at a 6th-grade level, ChatGPT's accuracy remained similar (2.81 ± 0.36, p=0.43).
ChatGPT outperformed web search in answering symptom-based diagnosis questions and was equivalent in providing medical facts and established policies. Proper prompting improved readability without affecting accuracy. The study highlights the potential of ChatGPT as a tool for patient education, but emphasizes the need for patient education on its benefits and limitations. The study also notes that ChatGPT may have biases and limitations, and that figures and diagrams are not included in its responses. Overall, ChatGPT can provide accurate and readable responses to a range of patient questions, but further research is needed to fully understand its role in healthcare.This study compares the readability, understandability, and accuracy of responses from ChatGPT and traditional web searches for patient questions related to otolaryngology. The researchers evaluated 54 questions categorized into three groups: Fact, Policy, and Diagnosis and Recommendations. ChatGPT responses had lower readability (Flesch Reading Ease: 42.3 ± 13.1 vs. 55.6 ± 10.5 for web search, p<0.001) but similar understandability (93.8% vs. 93.5%, p=0.17). ChatGPT performed better in Diagnosis and Recommendations questions (p<0.01), while there was no difference in Fact or Policy questions (p=0.15 and p=0.22, respectively). Adding a prompt to respond at a 6th-grade level improved ChatGPT readability (FRE: 55.6 ± 13.6, p<0.01).
Accuracy scores were higher for ChatGPT (mean 2.87 ± 0.34) compared to web search (2.61 ± 0.63, p=0.02). Both models showed high accuracy in Diagnosis and Recommendations questions. Accuracy was equivalent for Fact and Policy questions. When prompted to respond at a 6th-grade level, ChatGPT's accuracy remained similar (2.81 ± 0.36, p=0.43).
ChatGPT outperformed web search in answering symptom-based diagnosis questions and was equivalent in providing medical facts and established policies. Proper prompting improved readability without affecting accuracy. The study highlights the potential of ChatGPT as a tool for patient education, but emphasizes the need for patient education on its benefits and limitations. The study also notes that ChatGPT may have biases and limitations, and that figures and diagrams are not included in its responses. Overall, ChatGPT can provide accurate and readable responses to a range of patient questions, but further research is needed to fully understand its role in healthcare.