Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation

Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation

2024 | Elizabeth C. Stade, Shannon Wiltsey Stirman, Lyle H. Ungar, Cody L. Boland, H. Andrew Schwartz, David B. Yaden, João Sedoc, Robert J. DeReubeis, Robb Willer & Johannes C. Eichstaedt
Large language models (LLMs) have the potential to transform behavioral healthcare by supporting, augmenting, or even automating psychotherapy. However, their application in clinical psychology requires careful consideration due to the high stakes involved. This paper outlines a roadmap for the responsible development and evaluation of clinical LLMs in psychotherapy. It begins with a technical overview of clinical LLMs, followed by a discussion of their integration into psychotherapy, highlighting parallels with autonomous vehicle technology. Potential applications in clinical care, training, and research are discussed, along with recommendations for responsible development and evaluation, including centering clinical science, interdisciplinary collaboration, and addressing issues like assessment, risk detection, transparency, and bias. The paper also outlines a vision for how LLMs might enable large-scale studies of evidence-based interventions and challenge assumptions about psychotherapy. LLMs, such as GPT-4 and Gemini, are powerful AI systems capable of reading, summarizing, and generating text. They have a wide range of applications, including serving as chatbots, generating essays, translating languages, writing code, and diagnosing illness. While older AI applications, such as natural language processing, have been used in behavioral healthcare for decades, LLMs are still in their early stages of application in this field. Current applications include tailoring LLMs to help peer counselors increase their expressions of empathy and identifying therapists' and clients' behaviors in a motivational interviewing framework. LLMs have the potential to improve engagement and retention in digital mental health applications, but engagement alone is not sufficient for producing change. Clinical LLMs should focus on clinical improvement, such as reductions in symptoms or impairment, and prevention of risks and adverse events. Rigorous evaluation is essential, with a hierarchical prioritization of risk and safety, followed by feasibility, acceptability, and effectiveness. Interdisciplinary collaboration between clinical scientists, engineers, and technologists is crucial for the development of clinical LLMs. Behavioral health experts can provide guidance on fine-tuning models, addressing the use of real patient data, and ensuring ethical development. Clinical LLMs must prioritize accurate risk detection and mandated reporting, particularly in identifying suicidal/homicidal ideation, child/elder abuse, and intimate partner violence. They should also be "healthy," avoiding undesirable behaviors such as expressions akin to depression or narcissism. Clinical LLMs should integrate psychodiagnostic assessment, be responsive and flexible, stop when not helping or confident, be fair, inclusive, and free from bias, be empathetic to an extent, and be transparent about being AI. These design criteria are essential for the responsible development and evaluation of clinical LLMs in psychotherapy.Large language models (LLMs) have the potential to transform behavioral healthcare by supporting, augmenting, or even automating psychotherapy. However, their application in clinical psychology requires careful consideration due to the high stakes involved. This paper outlines a roadmap for the responsible development and evaluation of clinical LLMs in psychotherapy. It begins with a technical overview of clinical LLMs, followed by a discussion of their integration into psychotherapy, highlighting parallels with autonomous vehicle technology. Potential applications in clinical care, training, and research are discussed, along with recommendations for responsible development and evaluation, including centering clinical science, interdisciplinary collaboration, and addressing issues like assessment, risk detection, transparency, and bias. The paper also outlines a vision for how LLMs might enable large-scale studies of evidence-based interventions and challenge assumptions about psychotherapy. LLMs, such as GPT-4 and Gemini, are powerful AI systems capable of reading, summarizing, and generating text. They have a wide range of applications, including serving as chatbots, generating essays, translating languages, writing code, and diagnosing illness. While older AI applications, such as natural language processing, have been used in behavioral healthcare for decades, LLMs are still in their early stages of application in this field. Current applications include tailoring LLMs to help peer counselors increase their expressions of empathy and identifying therapists' and clients' behaviors in a motivational interviewing framework. LLMs have the potential to improve engagement and retention in digital mental health applications, but engagement alone is not sufficient for producing change. Clinical LLMs should focus on clinical improvement, such as reductions in symptoms or impairment, and prevention of risks and adverse events. Rigorous evaluation is essential, with a hierarchical prioritization of risk and safety, followed by feasibility, acceptability, and effectiveness. Interdisciplinary collaboration between clinical scientists, engineers, and technologists is crucial for the development of clinical LLMs. Behavioral health experts can provide guidance on fine-tuning models, addressing the use of real patient data, and ensuring ethical development. Clinical LLMs must prioritize accurate risk detection and mandated reporting, particularly in identifying suicidal/homicidal ideation, child/elder abuse, and intimate partner violence. They should also be "healthy," avoiding undesirable behaviors such as expressions akin to depression or narcissism. Clinical LLMs should integrate psychodiagnostic assessment, be responsive and flexible, stop when not helping or confident, be fair, inclusive, and free from bias, be empathetic to an extent, and be transparent about being AI. These design criteria are essential for the responsible development and evaluation of clinical LLMs in psychotherapy.
Reach us at info@study.space