2024 | Yuri Han, BA, Hassaam S. Choudhry, BA, Michael E. Simon, MD, Brian M. Katt, MD
This study evaluates ChatGPT's performance on hand surgery self-assessment exams from 2004 to 2013. ChatGPT answered 36.2% of the 1,583 questions correctly, with a higher accuracy on text-only questions (39.2%) compared to image-based questions (28.7%). ChatGPT provided elaborations for 59.0% of the questions, with no significant difference in the proportion of elaborations between text-only and image-based questions. However, elaborations for image-based questions were longer. ChatGPT's confident answers were 91.0%, with 38.0% correct, while unconfident answers were 8.97%, with 17.6% correct. Despite its high score of 44% on text-only questions, ChatGPT did not pass the exam. The study highlights that ChatGPT lacks proficiency in hand surgery knowledge and may provide inaccurate explanations, which could be harmful. ChatGPT's performance on image-based questions is limited due to its inability to process visual data. The study suggests that ChatGPT should be used cautiously in medical education, as it may not provide reliable information for hand surgery self-assessment. The results emphasize the need for accurate dissemination of knowledge through AI platforms in healthcare. The study has limitations, including potential biases in data and variability in ChatGPT's responses. Further research is needed to assess the reliability of AI responses.This study evaluates ChatGPT's performance on hand surgery self-assessment exams from 2004 to 2013. ChatGPT answered 36.2% of the 1,583 questions correctly, with a higher accuracy on text-only questions (39.2%) compared to image-based questions (28.7%). ChatGPT provided elaborations for 59.0% of the questions, with no significant difference in the proportion of elaborations between text-only and image-based questions. However, elaborations for image-based questions were longer. ChatGPT's confident answers were 91.0%, with 38.0% correct, while unconfident answers were 8.97%, with 17.6% correct. Despite its high score of 44% on text-only questions, ChatGPT did not pass the exam. The study highlights that ChatGPT lacks proficiency in hand surgery knowledge and may provide inaccurate explanations, which could be harmful. ChatGPT's performance on image-based questions is limited due to its inability to process visual data. The study suggests that ChatGPT should be used cautiously in medical education, as it may not provide reliable information for hand surgery self-assessment. The results emphasize the need for accurate dissemination of knowledge through AI platforms in healthcare. The study has limitations, including potential biases in data and variability in ChatGPT's responses. Further research is needed to assess the reliability of AI responses.