01/10/2024 | Vamsi Krishna Uppalpati, Deb Sanjay Nag
This study evaluates the performance of four AI language models—ChatGPT, Claude AI, Google Bard, and Perplexity AI—in complex medical decision-making scenarios across four key metrics: accuracy, relevance, clarity, and completeness. The research involved 14 scenarios and utilized a Likert scale with scores from 1 (bad) to 5 (good) by 14 experienced medical professionals. Claude AI performed best, scoring 3.34 for relevance and 3.45 for completeness. ChatGPT's responses were more concise but varied widely, while Google Bard's clarity scores were low. The study highlights that AI models are not yet ready for medical decision-making, emphasizing the need for further development and fine-tuning. The findings suggest that Claude AI is superior in generating contextually relevant and thorough responses, while Google Bard's unpredictable clarity highlights the need for enhanced model tuning. The study also discusses ethical considerations and future directions, including the importance of human oversight and longitudinal studies to improve AI's performance in healthcare.This study evaluates the performance of four AI language models—ChatGPT, Claude AI, Google Bard, and Perplexity AI—in complex medical decision-making scenarios across four key metrics: accuracy, relevance, clarity, and completeness. The research involved 14 scenarios and utilized a Likert scale with scores from 1 (bad) to 5 (good) by 14 experienced medical professionals. Claude AI performed best, scoring 3.34 for relevance and 3.45 for completeness. ChatGPT's responses were more concise but varied widely, while Google Bard's clarity scores were low. The study highlights that AI models are not yet ready for medical decision-making, emphasizing the need for further development and fine-tuning. The findings suggest that Claude AI is superior in generating contextually relevant and thorough responses, while Google Bard's unpredictable clarity highlights the need for enhanced model tuning. The study also discusses ethical considerations and future directions, including the importance of human oversight and longitudinal studies to improve AI's performance in healthcare.