2024 | Yubin Kim, Xuhai Xu, Daniel McDuff, Cynthia Breazeal, Hae Won Park
This paper investigates the capabilities of large language models (LLMs) in making health predictions based on contextual information and physiological data from wearable sensors. The authors evaluate 12 state-of-the-art LLMs on four public health datasets (PMData, LifeSnaps, GLOBEM, and AW_FB) across ten consumer health prediction tasks, including mental health, activity tracking, metabolism, and sleep assessment. Their fine-tuned model, HealthAlpaca, outperforms much larger models like GPT-3.5, GPT-4, and Gemini-Pro in 8 out of 10 tasks. Context enhancement strategies, particularly the inclusion of health knowledge, significantly improve performance, with up to 23.8% improvement observed. The study also highlights the importance of temporal context and the effectiveness of zero-shot and few-shot prompting techniques. The authors conclude by discussing the limitations and future directions, emphasizing the need for ethical considerations and improved explainability in health predictions.This paper investigates the capabilities of large language models (LLMs) in making health predictions based on contextual information and physiological data from wearable sensors. The authors evaluate 12 state-of-the-art LLMs on four public health datasets (PMData, LifeSnaps, GLOBEM, and AW_FB) across ten consumer health prediction tasks, including mental health, activity tracking, metabolism, and sleep assessment. Their fine-tuned model, HealthAlpaca, outperforms much larger models like GPT-3.5, GPT-4, and Gemini-Pro in 8 out of 10 tasks. Context enhancement strategies, particularly the inclusion of health knowledge, significantly improve performance, with up to 23.8% improvement observed. The study also highlights the importance of temporal context and the effectiveness of zero-shot and few-shot prompting techniques. The authors conclude by discussing the limitations and future directions, emphasizing the need for ethical considerations and improved explainability in health predictions.