Reliability and accuracy of artificial intelligence ChatGPT in providing information on ophthalmic diseases and management to patients

Reliability and accuracy of artificial intelligence ChatGPT in providing information on ophthalmic diseases and management to patients

2024 | Francesco Cappellani, Kevin R. Card, Carol L. Shields, Jose S. Pulido and Julia A. Haller
This study evaluates the reliability and accuracy of ChatGPT in providing information on ophthalmic diseases and their management to patients. Five diseases from eight ophthalmic subspecialties were assessed. Three questions were asked for each disease: "What is x?", "How is x diagnosed?", and "How is x treated?" (x = disease name). Responses were graded against the American Academy of Ophthalmology (AAO) guidelines, with scores ranging from -3 (potentially harmful) to 2 (correct and complete). Of the 120 questions, 93 (77.5%) scored ≥1.27, indicating at least some correct information, while 22.5% scored ≤-1, with 9 (7.5%) scoring -3. The overall median scores were 2 for "What is x?", 1.5 for "How is x diagnosed?", and 1 for "How is x treated". However, these differences were not statistically significant. ChatGPT provided incomplete, incorrect, and potentially harmful information about common ophthalmic conditions. It may be a useful tool for patient education but requires human medical supervision to ensure accuracy and safety. The study highlights the need for careful evaluation of AI-generated medical information, as it can vary in accuracy based on disease familiarity and training data. While ChatGPT shows promise, it is not yet sufficient for clinical use without human oversight. Further research is needed to assess its accuracy in ophthalmology and other medical fields.This study evaluates the reliability and accuracy of ChatGPT in providing information on ophthalmic diseases and their management to patients. Five diseases from eight ophthalmic subspecialties were assessed. Three questions were asked for each disease: "What is x?", "How is x diagnosed?", and "How is x treated?" (x = disease name). Responses were graded against the American Academy of Ophthalmology (AAO) guidelines, with scores ranging from -3 (potentially harmful) to 2 (correct and complete). Of the 120 questions, 93 (77.5%) scored ≥1.27, indicating at least some correct information, while 22.5% scored ≤-1, with 9 (7.5%) scoring -3. The overall median scores were 2 for "What is x?", 1.5 for "How is x diagnosed?", and 1 for "How is x treated". However, these differences were not statistically significant. ChatGPT provided incomplete, incorrect, and potentially harmful information about common ophthalmic conditions. It may be a useful tool for patient education but requires human medical supervision to ensure accuracy and safety. The study highlights the need for careful evaluation of AI-generated medical information, as it can vary in accuracy based on disease familiarity and training data. While ChatGPT shows promise, it is not yet sufficient for clinical use without human oversight. Further research is needed to assess its accuracy in ophthalmology and other medical fields.
Reach us at info@study.space
[slides and audio] Reliability and accuracy of artificial intelligence ChatGPT in providing information on ophthalmic diseases and management to patients