Depression symptoms modelling from social media text: an LLM driven semi-supervised learning approach

Depression symptoms modelling from social media text: an LLM driven semi-supervised learning approach

Accepted: 9 January 2024 / Published online: 4 April 2024 | Nawshad Farruque, Randy Goebel, Sudhakar Sivapalan, Osmar R. Zaiane
The paper presents a semi-supervised learning (SSL) framework for detecting depression symptoms (DSD) from social media text. The authors address the lack of a comprehensive dataset that reflects both clinical insights and the distribution of depression symptoms from self-disclosed depressed populations. They propose a method that combines a state-of-the-art large mental health forum text pre-trained language model with a Zero-Shot learning (ZSL) model and a clinician-annotated dataset. The framework iteratively harvests depression-related samples from a large self-curated depressive tweets repository (DTR) and re-trains the DSD model to improve its accuracy. The clinician-annotated dataset is the largest of its kind, created from self-disclosed depressed users' Twitter timelines. The authors discuss the stopping criteria and limitations of the SSL process, emphasizing the importance of preserving the distribution of depression symptoms in the harvested data. The final DSD model achieves significantly better accuracy compared to its initial version, demonstrating the effectiveness of the proposed approach.The paper presents a semi-supervised learning (SSL) framework for detecting depression symptoms (DSD) from social media text. The authors address the lack of a comprehensive dataset that reflects both clinical insights and the distribution of depression symptoms from self-disclosed depressed populations. They propose a method that combines a state-of-the-art large mental health forum text pre-trained language model with a Zero-Shot learning (ZSL) model and a clinician-annotated dataset. The framework iteratively harvests depression-related samples from a large self-curated depressive tweets repository (DTR) and re-trains the DSD model to improve its accuracy. The clinician-annotated dataset is the largest of its kind, created from self-disclosed depressed users' Twitter timelines. The authors discuss the stopping criteria and limitations of the SSL process, emphasizing the importance of preserving the distribution of depression symptoms in the harvested data. The final DSD model achieves significantly better accuracy compared to its initial version, demonstrating the effectiveness of the proposed approach.
Reach us at info@study.space