23 January 2024 | Muhamet Kastrati, Zenun Kastrati, Ali Shariq Imran, Marenglen Biba
This study addresses the challenge of sentiment and emotion classification in Twitter posts by leveraging distant supervision and deep learning. The authors collected a large-scale dataset of 17.5 million tweets, automatically labeled with Ekman's six basic emotions using emojis. They compared various conventional machine learning models and deep learning models, including transformer-based models, to establish baseline results. The experimental results and ablation analysis showed that a BiLSTM model with FastText pre-trained word embeddings and an attention mechanism outperformed other models, achieving an F1-score of 70.92% for sentiment classification and 54.85% for emotion detection. The study also explored the impact of dataset size, class imbalance, pre-trained word embeddings, and attention mechanisms on model performance. The findings highlight the effectiveness of distant supervision and deep learning in handling large-scale, real-world datasets for sentiment and emotion analysis.This study addresses the challenge of sentiment and emotion classification in Twitter posts by leveraging distant supervision and deep learning. The authors collected a large-scale dataset of 17.5 million tweets, automatically labeled with Ekman's six basic emotions using emojis. They compared various conventional machine learning models and deep learning models, including transformer-based models, to establish baseline results. The experimental results and ablation analysis showed that a BiLSTM model with FastText pre-trained word embeddings and an attention mechanism outperformed other models, achieving an F1-score of 70.92% for sentiment classification and 54.85% for emotion detection. The study also explored the impact of dataset size, class imbalance, pre-trained word embeddings, and attention mechanisms on model performance. The findings highlight the effectiveness of distant supervision and deep learning in handling large-scale, real-world datasets for sentiment and emotion analysis.