Understanding Backtranslate what you are saying and I will tell who you are

This paper explores the effectiveness of backtranslation and expansion modules in enhancing the performance of author profiling tasks. The authors hypothesize that semantically enriching a user's text corpus through backtranslation and expansion can improve classification accuracy. The framework includes backtranslation, expansion, and a state-of-the-art classifier. Backtranslation involves translating an author's text from one language to another and then back to the original language. Expansion combines the original text with the back-translated version. The framework is evaluated on three datasets from the CLEF conference: fake news, hate speech, and irony and stereotypes detection. The results show that the backtranslation and expansion modules significantly improve model performance on all three datasets. The study also investigates the impact of different target languages (Italian, German, Japanese, and Turkish) and compares the performance of various classifiers (CNN, RoBERTa, GPT-2, and SVM). Qualitative analysis reveals that backtranslation enhances the information content of texts, particularly in identifying hate speech. However, the computational cost of backtranslation and the increased size of augmented samples are noted as limitations. Future work will explore further improvements and the impact of other languages.This paper explores the effectiveness of backtranslation and expansion modules in enhancing the performance of author profiling tasks. The authors hypothesize that semantically enriching a user's text corpus through backtranslation and expansion can improve classification accuracy. The framework includes backtranslation, expansion, and a state-of-the-art classifier. Backtranslation involves translating an author's text from one language to another and then back to the original language. Expansion combines the original text with the back-translated version. The framework is evaluated on three datasets from the CLEF conference: fake news, hate speech, and irony and stereotypes detection. The results show that the backtranslation and expansion modules significantly improve model performance on all three datasets. The study also investigates the impact of different target languages (Italian, German, Japanese, and Turkish) and compares the performance of various classifiers (CNN, RoBERTa, GPT-2, and SVM). Qualitative analysis reveals that backtranslation enhances the information content of texts, particularly in identifying hate speech. However, the computational cost of backtranslation and the increased size of augmented samples are noted as limitations. Future work will explore further improvements and the impact of other languages.

Backtranslate what you are saying and I will tell who you are

2024 | Marco Siino | Francesco Lomonaco | Paolo Rosso