Augmenting sentiment prediction capabilities for code-mixed tweets with multilingual transformers

Augmenting sentiment prediction capabilities for code-mixed tweets with multilingual transformers

15 April 2024 | Ehtesham Hashmi, Sule Yildirim Yayilgan, Sarang Shaikh
This study explores the use of multilingual transformers to enhance sentiment prediction in code-mixed text, specifically combining Roman Urdu and English. The research addresses the challenges of sentiment analysis in low-resource languages, where traditional methods struggle with syntactic ambiguity and limited linguistic resources. The study evaluates three transformer-based models: Electra, code-mixed BERT (cm-BERT), and Multilingual Bidirectional and Auto-Regressive Transformers (mBART). Results show that mBART outperformed the other models, achieving an overall F1-score of 0.73 for sentiment prediction in code-mixed text. Additionally, the study uses topic modeling with Latent Dirichlet Allocation (LDA) to uncover shared characteristics and patterns across different sentiment classes, revealing that neutral and positive classes share similar sentiments. The research also investigates the impact of generative configuration parameters such as temperature, top-k, and top-p on sentiment analysis performance. The study concludes that mBART is the most effective model for sentiment prediction in code-mixed text, demonstrating the potential of multilingual transformers in handling complex linguistic contexts. The findings highlight the importance of using advanced NLP techniques and personalized computational methods to improve sentiment analysis in multilingual and code-mixed digital communications. The study contributes to the field of computational linguistics by providing a robust model for processing low-resource languages and enhancing the accuracy of sentiment prediction in code-mixed text.This study explores the use of multilingual transformers to enhance sentiment prediction in code-mixed text, specifically combining Roman Urdu and English. The research addresses the challenges of sentiment analysis in low-resource languages, where traditional methods struggle with syntactic ambiguity and limited linguistic resources. The study evaluates three transformer-based models: Electra, code-mixed BERT (cm-BERT), and Multilingual Bidirectional and Auto-Regressive Transformers (mBART). Results show that mBART outperformed the other models, achieving an overall F1-score of 0.73 for sentiment prediction in code-mixed text. Additionally, the study uses topic modeling with Latent Dirichlet Allocation (LDA) to uncover shared characteristics and patterns across different sentiment classes, revealing that neutral and positive classes share similar sentiments. The research also investigates the impact of generative configuration parameters such as temperature, top-k, and top-p on sentiment analysis performance. The study concludes that mBART is the most effective model for sentiment prediction in code-mixed text, demonstrating the potential of multilingual transformers in handling complex linguistic contexts. The findings highlight the importance of using advanced NLP techniques and personalized computational methods to improve sentiment analysis in multilingual and code-mixed digital communications. The study contributes to the field of computational linguistics by providing a robust model for processing low-resource languages and enhancing the accuracy of sentiment prediction in code-mixed text.
Reach us at info@study.space