Prediction of outcomes after cardiac arrest by a generative artificial intelligence model

Prediction of outcomes after cardiac arrest by a generative artificial intelligence model

2024 | Simon A. Amacher, Armon Arpagaus, Christian Sahmer, Christoph Becker, Sebastian Gross, Tabita Urben, Kai Tisljar, Raoul Sutter, Stephan Marsch, Sabina Hunziker
A study evaluated the prognostic accuracy of ChatGPT-4, a large generative artificial intelligence model, in predicting mortality and poor neurological outcomes in adult cardiac arrest patients. The research compared ChatGPT-4's performance with three established post-cardiac arrest scores: OHCA, CAHP, and PROLOGUE. The study involved 713 patients admitted to an intensive care unit following cardiac arrest. ChatGPT-4 demonstrated good discrimination in predicting in-hospital mortality with an area under the curve (AUC) of 0.85, comparable to the OHCA (0.81), CAHP (0.83), and PROLOGUE (0.84) scores. For poor neurological outcomes, ChatGPT-4 showed similar predictive performance with an AUC of 0.83, comparable to the post-cardiac arrest scores. The model's predictions were based on 16 patient-related parameters derived from established scores. However, the study also identified instances of "hallucinations" or illogical answers generated by ChatGPT-4, which were corrected by re-prompting the model. The study highlights the potential of large language models (LLMs) in clinical settings for prognostication but emphasizes the need for human oversight due to the risk of inaccurate predictions. While ChatGPT-4 showed comparable performance to validated scores, further research is needed to address the limitations of LLMs, including the potential for biases and the need for structured, high-quality data for training. The study concludes that ChatGPT-4 may be a useful tool for early risk prediction in cardiac arrest patients but requires careful integration into clinical practice with human supervision.A study evaluated the prognostic accuracy of ChatGPT-4, a large generative artificial intelligence model, in predicting mortality and poor neurological outcomes in adult cardiac arrest patients. The research compared ChatGPT-4's performance with three established post-cardiac arrest scores: OHCA, CAHP, and PROLOGUE. The study involved 713 patients admitted to an intensive care unit following cardiac arrest. ChatGPT-4 demonstrated good discrimination in predicting in-hospital mortality with an area under the curve (AUC) of 0.85, comparable to the OHCA (0.81), CAHP (0.83), and PROLOGUE (0.84) scores. For poor neurological outcomes, ChatGPT-4 showed similar predictive performance with an AUC of 0.83, comparable to the post-cardiac arrest scores. The model's predictions were based on 16 patient-related parameters derived from established scores. However, the study also identified instances of "hallucinations" or illogical answers generated by ChatGPT-4, which were corrected by re-prompting the model. The study highlights the potential of large language models (LLMs) in clinical settings for prognostication but emphasizes the need for human oversight due to the risk of inaccurate predictions. While ChatGPT-4 showed comparable performance to validated scores, further research is needed to address the limitations of LLMs, including the potential for biases and the need for structured, high-quality data for training. The study concludes that ChatGPT-4 may be a useful tool for early risk prediction in cardiac arrest patients but requires careful integration into clinical practice with human supervision.
Reach us at info@study.space