Improving Text Classification with Large Language Model-Based Data Augmentation

Improving Text Classification with Large Language Model-Based Data Augmentation

28 June 2024 | Huanhuan Zhao, Haihua Chen, Thomas A. Ruggles, Yunhe Feng, Debjani Singh and Hong-Jun Yoon
This study investigates the effectiveness of two large language model (LLM)-based data augmentation (DA) methods—generating new samples and rewriting existing samples—for improving text classification performance on imbalanced datasets. The research evaluates these methods on two datasets: the Reuters news data (general topic) and the Mitigation dataset (domain-specific). The findings indicate that generating new samples consistently enhances model performance for both datasets. New samples generally outperform rewritten samples, though careful prompt engineering is crucial for domain-specific data. The effectiveness of DA plateaus after incorporating 10 samples, suggesting that 10 samples per label are sufficient. Combining rewritten and new samples further improves classification results, particularly for minority classes. The study also shows that new samples introduce novel information, while rewritten samples may replace critical terms with synonyms, potentially harming performance. The BERT model with DA methods achieved significant improvements in macro-F1 scores, especially for minority classes. The results highlight the potential of LLM-based DA in addressing class imbalance in text classification tasks. The study contributes to understanding the optimal use of LLMs in data augmentation for text classification, offering insights into effective strategies for improving model performance.This study investigates the effectiveness of two large language model (LLM)-based data augmentation (DA) methods—generating new samples and rewriting existing samples—for improving text classification performance on imbalanced datasets. The research evaluates these methods on two datasets: the Reuters news data (general topic) and the Mitigation dataset (domain-specific). The findings indicate that generating new samples consistently enhances model performance for both datasets. New samples generally outperform rewritten samples, though careful prompt engineering is crucial for domain-specific data. The effectiveness of DA plateaus after incorporating 10 samples, suggesting that 10 samples per label are sufficient. Combining rewritten and new samples further improves classification results, particularly for minority classes. The study also shows that new samples introduce novel information, while rewritten samples may replace critical terms with synonyms, potentially harming performance. The BERT model with DA methods achieved significant improvements in macro-F1 scores, especially for minority classes. The results highlight the potential of LLM-based DA in addressing class imbalance in text classification tasks. The study contributes to understanding the optimal use of LLMs in data augmentation for text classification, offering insights into effective strategies for improving model performance.
Reach us at info@futurestudyspace.com