Clinical decision support for bipolar depression using large language models

Clinical decision support for bipolar depression using large language models

2024 | Roy H. Perlis, Joseph F. Goldberg, Michael J. Ostacher, Christopher D. Schneck
A study evaluated the effectiveness of large language models (LLMs) in providing clinical decision support for bipolar depression. The researchers developed 50 clinical vignettes representing bipolar depression and presented them to experts in bipolar disorder, who identified 5 optimal next-step pharmacotherapies and 5 poor or contraindicated choices. The same vignettes were presented to an LLM (GPT-4-turbo) with or without augmentation by prompting with recent bipolar treatment guidelines. The LLM was asked to identify the optimal next-step pharmacotherapy. The augmented model prioritized the expert-designated optimal choice for 508/1000 vignettes (50.8%), with a Cohen's kappa of 0.31. For 120 vignettes (12.0%), at least one model choice was among the poor or contraindicated treatments. The un-augmented model identified the optimal treatment for 234 (23.0%) of the vignettes. A sample of community clinicians identified the optimal choice for 23.1% of vignettes. The study found that LLMs prompted with evidence-based guidelines represent a promising, scalable strategy for clinical decision support. However, strategies to avoid clinician overreliance on such models and address the possibility of bias will be needed. The study also examined the possibility of bias by stratifying model responses by gender and race. The results showed some evidence of bias in model performance. The study concluded that integrating treatment guidelines with clinical context can yield a decision-support tool, and that randomized trials are needed to determine whether the augmented model can improve clinical outcomes. The study had several limitations, including the possibility that critical information omitted from the vignettes could affect prediction. The study also noted the need for further research to understand the extent to which the incorporation of guidelines improves treatment selections in other clinical contexts. The study found that the augmented model performed better than a sample of community clinicians, suggesting the potential utility of LLMs in providing a guideline-based standard of care in clinical settings.A study evaluated the effectiveness of large language models (LLMs) in providing clinical decision support for bipolar depression. The researchers developed 50 clinical vignettes representing bipolar depression and presented them to experts in bipolar disorder, who identified 5 optimal next-step pharmacotherapies and 5 poor or contraindicated choices. The same vignettes were presented to an LLM (GPT-4-turbo) with or without augmentation by prompting with recent bipolar treatment guidelines. The LLM was asked to identify the optimal next-step pharmacotherapy. The augmented model prioritized the expert-designated optimal choice for 508/1000 vignettes (50.8%), with a Cohen's kappa of 0.31. For 120 vignettes (12.0%), at least one model choice was among the poor or contraindicated treatments. The un-augmented model identified the optimal treatment for 234 (23.0%) of the vignettes. A sample of community clinicians identified the optimal choice for 23.1% of vignettes. The study found that LLMs prompted with evidence-based guidelines represent a promising, scalable strategy for clinical decision support. However, strategies to avoid clinician overreliance on such models and address the possibility of bias will be needed. The study also examined the possibility of bias by stratifying model responses by gender and race. The results showed some evidence of bias in model performance. The study concluded that integrating treatment guidelines with clinical context can yield a decision-support tool, and that randomized trials are needed to determine whether the augmented model can improve clinical outcomes. The study had several limitations, including the possibility that critical information omitted from the vignettes could affect prediction. The study also noted the need for further research to understand the extent to which the incorporation of guidelines improves treatment selections in other clinical contexts. The study found that the augmented model performed better than a sample of community clinicians, suggesting the potential utility of LLMs in providing a guideline-based standard of care in clinical settings.
Reach us at info@study.space
[slides and audio] Clinical decision support for bipolar depression using large language models