[slides] Prediction of outcomes after cardiac arrest by a generative artificial intelligence model

This study investigates the prognostic accuracy of a non-medical generative artificial intelligence model, ChatGPT-4, in predicting death and poor neurological outcomes at hospital discharge for adult cardiac arrest patients. The research is conducted at a large Swiss tertiary academic medical center using real-life data from the COMMUNICATE/PROPHETIC cohort study. ChatGPT-4 was prompted with sixteen prognostic parameters derived from established post-cardiac arrest scores. The model's performance was compared to three validated post-cardiac arrest scores (Out-of-Hospital Cardiac Arrest [OHCA], Cardiac Arrest Hospital Prognosis [CAHP], and PROgnostication using LOGistic regression model for Unselected adult cardiac arrest patients in the Early stages [PROLOGUE]) in terms of area under the curve (AUC), sensitivity, specificity, positive and negative predictive values, and likelihood ratios. The results show that ChatGPT-4 demonstrated good discrimination in predicting in-hospital mortality with an AUC of 0.85, similar to the OHCA (0.82), CAHP (0.83), and PROLOGUE (0.84) scores. For poor neurological outcomes, ChatGPT-4 had an AUC of 0.83, comparable to the other scores. However, the study also highlights the need for human supervision due to instances of illogical answers (hallucinations) in 8.3%, 13.2%, and 14.0% of the runs, respectively. The study concludes that while ChatGPT-4 shows promising performance, it still requires further research and human oversight to ensure reliable and accurate predictions in clinical settings.This study investigates the prognostic accuracy of a non-medical generative artificial intelligence model, ChatGPT-4, in predicting death and poor neurological outcomes at hospital discharge for adult cardiac arrest patients. The research is conducted at a large Swiss tertiary academic medical center using real-life data from the COMMUNICATE/PROPHETIC cohort study. ChatGPT-4 was prompted with sixteen prognostic parameters derived from established post-cardiac arrest scores. The model's performance was compared to three validated post-cardiac arrest scores (Out-of-Hospital Cardiac Arrest [OHCA], Cardiac Arrest Hospital Prognosis [CAHP], and PROgnostication using LOGistic regression model for Unselected adult cardiac arrest patients in the Early stages [PROLOGUE]) in terms of area under the curve (AUC), sensitivity, specificity, positive and negative predictive values, and likelihood ratios. The results show that ChatGPT-4 demonstrated good discrimination in predicting in-hospital mortality with an AUC of 0.85, similar to the OHCA (0.82), CAHP (0.83), and PROLOGUE (0.84) scores. For poor neurological outcomes, ChatGPT-4 had an AUC of 0.83, comparable to the other scores. However, the study also highlights the need for human supervision due to instances of illogical answers (hallucinations) in 8.3%, 13.2%, and 14.0% of the runs, respectively. The study concludes that while ChatGPT-4 shows promising performance, it still requires further research and human oversight to ensure reliable and accurate predictions in clinical settings.

Prediction of outcomes after cardiac arrest by a generative artificial intelligence model

Accepted 11 February 2024 | Simon A. Amacher, Armon Arpagaus, Christian Sahmer, Christoph Becker, Sebastian Gross, Tabita Urben, Kai Tisljar, Raoul Sutter, Stephan Marsch, Sabina Hunziker