2024 | Patricia Garcia, MD; Stephen P. Ma, MD, PhD; Shreya Shah, MD; Margaret Smith, MBA; Yejin Jeong, BA; Anna Devon-Sand, MPH; Ming Tai-Seale, PhD, MPH; Kevin Takazawa, BBA; Danyelle Clutter, MBA; Kyle Vogt, BA; Carlene Lugtu, MCIM; Matthew Rojo, MS; Steven Lin, MD; Tait Shanafelt, MD; Michael A. Pfeffer, MD; Christopher Sharp, MD
A 5-week quality improvement study evaluated the implementation of a large language model (LLM) to draft responses to patient inbox messages at Stanford Health Care. The LLM, compliant with HIPAA and integrated with the EHR, generated draft replies for clinicians. Of 197 enrolled clinicians, 162 were included in the analysis. The mean AI-generated draft reply utilization rate was 20%. There were statistically significant reductions in physician task load scores and work exhaustion scores, but no changes in reply action time, write time, or read time. Clinicians reported positive perceptions of the tool's utility and time-saving potential, though some noted issues with draft message tone and brevity. The study found that while the LLM was adopted and showed promise in reducing clinician burden, there were no time savings. Qualitative feedback highlighted the need for improvements in tone, brevity, and personalization. The findings suggest that generative AI can be spontaneously adopted and improve clinician well-being, but further research is needed to optimize its use and address limitations in time efficiency and user experience. The study underscores the importance of evaluating AI implementation in clinical practice to guide future development and organizational strategies.A 5-week quality improvement study evaluated the implementation of a large language model (LLM) to draft responses to patient inbox messages at Stanford Health Care. The LLM, compliant with HIPAA and integrated with the EHR, generated draft replies for clinicians. Of 197 enrolled clinicians, 162 were included in the analysis. The mean AI-generated draft reply utilization rate was 20%. There were statistically significant reductions in physician task load scores and work exhaustion scores, but no changes in reply action time, write time, or read time. Clinicians reported positive perceptions of the tool's utility and time-saving potential, though some noted issues with draft message tone and brevity. The study found that while the LLM was adopted and showed promise in reducing clinician burden, there were no time savings. Qualitative feedback highlighted the need for improvements in tone, brevity, and personalization. The findings suggest that generative AI can be spontaneously adopted and improve clinician well-being, but further research is needed to optimize its use and address limitations in time efficiency and user experience. The study underscores the importance of evaluating AI implementation in clinical practice to guide future development and organizational strategies.