Understanding ChatGPT vs. web search for patient questions%3A what does ChatGPT do better%3F

This study compares the effectiveness of ChatGPT and traditional web searches in answering patient questions related to medical information. The researchers sourced 54 questions from online posts, categorized them into three groups: Fact, Policy, and Diagnosis and Recommendations. They evaluated the readability (Flesch Reading Ease and Flesch-Kinkaid Grade Level) and understandability (Patient Education Materials Assessment Tool) of the responses from ChatGPT and web searches. Additionally, accuracy was assessed by two blinded clinical evaluators using a three-point ordinal scale. Key findings include: - ChatGPT responses had lower readability scores (FRE: 42.3±13.1 vs. 55.6±10.5, p<0.001) compared to web search results, but equivalent understandability (93.8% vs. 93.5%, p=0.17). - ChatGPT performed better in the 'Diagnosis' category (p<0.01), while there was no significant difference in the 'Fact' and 'Policy' categories. - Additional prompting improved the readability of ChatGPT responses (FRE 55.6±13.6, p<0.01). The study concludes that ChatGPT outperforms web searches in answering patient questions related to symptom-based diagnoses and is equivalent in providing medical facts and established policies. Appropriate prompting can further enhance readability while maintaining accuracy. Further patient education is needed to ensure the responsible use of this technology as a source of medical information.This study compares the effectiveness of ChatGPT and traditional web searches in answering patient questions related to medical information. The researchers sourced 54 questions from online posts, categorized them into three groups: Fact, Policy, and Diagnosis and Recommendations. They evaluated the readability (Flesch Reading Ease and Flesch-Kinkaid Grade Level) and understandability (Patient Education Materials Assessment Tool) of the responses from ChatGPT and web searches. Additionally, accuracy was assessed by two blinded clinical evaluators using a three-point ordinal scale. Key findings include: - ChatGPT responses had lower readability scores (FRE: 42.3±13.1 vs. 55.6±10.5, p<0.001) compared to web search results, but equivalent understandability (93.8% vs. 93.5%, p=0.17). - ChatGPT performed better in the 'Diagnosis' category (p<0.01), while there was no significant difference in the 'Fact' and 'Policy' categories. - Additional prompting improved the readability of ChatGPT responses (FRE 55.6±13.6, p<0.01). The study concludes that ChatGPT outperforms web searches in answering patient questions related to symptom-based diagnoses and is equivalent in providing medical facts and established policies. Appropriate prompting can further enhance readability while maintaining accuracy. Further patient education is needed to ensure the responsible use of this technology as a source of medical information.

ChatGPT vs. Web Search for Patient Questions: What Does ChatGPT Do Better?

2024 June | Sarek A Shen, MD, MS; Carlos A Perez-Heydrich, BS; Deborah X. Xie, MD; Jason Nellis, MD