Socially Aware Synthetic Data Generation for Suicidal Ideation Detection Using Large Language Models

Socially Aware Synthetic Data Generation for Suicidal Ideation Detection Using Large Language Models

25 Jan 2024 | HAMIDEH GHANADIAN, ISAR NEJADGHOli, HUSSEIN AL OSMAN
This paper addresses the challenge of suicidal ideation detection by leveraging generative AI models to create synthetic data. The authors extract social factors from psychology literature to guide the data generation process, ensuring coverage of essential information related to suicidal ideation. They benchmark their approach against state-of-the-art NLP classification models, specifically those based on the BERT family, using the University of Maryland Suicidality dataset (UMD). The results show that their synthetic data-driven method, informed by social factors, achieves consistent F1-scores of 0.82 for both models, outperforming conventional models trained on real-world data. Notably, combining only 30% of the UMD dataset with synthetic data significantly improves performance to an F1-score of 0.88 on the UMD test set. The study highlights the cost-effectiveness and potential of synthetic data in addressing data scarcity and enhancing diversity in data representation, making it a valuable tool for improving mental health support systems.This paper addresses the challenge of suicidal ideation detection by leveraging generative AI models to create synthetic data. The authors extract social factors from psychology literature to guide the data generation process, ensuring coverage of essential information related to suicidal ideation. They benchmark their approach against state-of-the-art NLP classification models, specifically those based on the BERT family, using the University of Maryland Suicidality dataset (UMD). The results show that their synthetic data-driven method, informed by social factors, achieves consistent F1-scores of 0.82 for both models, outperforming conventional models trained on real-world data. Notably, combining only 30% of the UMD dataset with synthetic data significantly improves performance to an F1-score of 0.88 on the UMD test set. The study highlights the cost-effectiveness and potential of synthetic data in addressing data scarcity and enhancing diversity in data representation, making it a valuable tool for improving mental health support systems.
Reach us at info@study.space