Evaluating ChatGPT-3.5 and ChatGPT-4.0 Responses on Hyperlipidemia for Patient Education

Evaluating ChatGPT-3.5 and ChatGPT-4.0 Responses on Hyperlipidemia for Patient Education

May 25, 2024 | Thomas J. Lee, Abhinav K. Rao, Daniel J. Campbell, Navid Radfar, Manik Dayal, Ayham Khrais
This study evaluates the accuracy, comprehensibility, and response length of ChatGPT-3.5 and ChatGPT-4.0 for patient education on hyperlipidemia. The researchers compared the two versions of the AI chatbot using 25 frequently asked questions from the Cleveland Clinic's FAQ on hyperlipidemia. The questions were prompted in three ways: no prompting, patient-friendly prompting, and physician-level prompting. Responses were categorized as incorrect, partially correct, or correct, and their grade level and word count were recorded. Overall, ChatGPT-4.0 had a higher accuracy rate (74.67% correct) compared to ChatGPT-3.5 (69.33% correct), but the difference was not statistically significant. ChatGPT-3.5 had a significantly higher grade reading level and word count than ChatGPT-4.0. ChatGPT-4.0 provided more concise and readable information, aligning with the readability of most online medical resources, although it exceeded the National Institutes of Health's (NIH) recommended eighth-grade reading level. The paid version (ChatGPT-4.0) demonstrated superior adaptability in tailoring responses based on the input. Both versions of ChatGPT provided accurate but sometimes partially complete responses. The study found no significant difference in accuracy between the free and paid versions. The paid version offered more adaptable and readable responses, making it a better choice for patient education. Healthcare providers can recommend ChatGPT as a source of patient education, regardless of the version used. Future research should explore diverse question formulations and ChatGPT's handling of incorrect information.This study evaluates the accuracy, comprehensibility, and response length of ChatGPT-3.5 and ChatGPT-4.0 for patient education on hyperlipidemia. The researchers compared the two versions of the AI chatbot using 25 frequently asked questions from the Cleveland Clinic's FAQ on hyperlipidemia. The questions were prompted in three ways: no prompting, patient-friendly prompting, and physician-level prompting. Responses were categorized as incorrect, partially correct, or correct, and their grade level and word count were recorded. Overall, ChatGPT-4.0 had a higher accuracy rate (74.67% correct) compared to ChatGPT-3.5 (69.33% correct), but the difference was not statistically significant. ChatGPT-3.5 had a significantly higher grade reading level and word count than ChatGPT-4.0. ChatGPT-4.0 provided more concise and readable information, aligning with the readability of most online medical resources, although it exceeded the National Institutes of Health's (NIH) recommended eighth-grade reading level. The paid version (ChatGPT-4.0) demonstrated superior adaptability in tailoring responses based on the input. Both versions of ChatGPT provided accurate but sometimes partially complete responses. The study found no significant difference in accuracy between the free and paid versions. The paid version offered more adaptable and readable responses, making it a better choice for patient education. Healthcare providers can recommend ChatGPT as a source of patient education, regardless of the version used. Future research should explore diverse question formulations and ChatGPT's handling of incorrect information.
Reach us at info@study.space
Understanding Evaluating ChatGPT-3.5 and ChatGPT-4.0 Responses on Hyperlipidemia for Patient Education