27 Feb 2024 | Maram Hasanain, Fatema Ahmed, Firoj Alam
This paper presents ArPro, the largest annotated dataset for detecting propaganda techniques in Arabic news articles. The dataset consists of 8,000 paragraphs from 2,800 news articles, annotated at the text span level using a taxonomy of 23 propagandistic techniques. The study evaluates the performance of GPT-4 in detecting propaganda techniques and their manifestations in text. Results show that GPT-4's performance degrades as the task becomes more granular, and it struggles with span-level detection across multiple languages. The dataset and resources are released to the community for further research. The study also highlights the challenges of annotating propaganda techniques, including subjectivity, contextual variations, and linguistic nuances. The results indicate that fine-tuned models outperform GPT-4 in zero-shot settings, and that span-level detection is a complex task requiring careful annotation. The study contributes to the field of computational propaganda by providing a comprehensive dataset and insights into the performance of large language models in propaganda detection.This paper presents ArPro, the largest annotated dataset for detecting propaganda techniques in Arabic news articles. The dataset consists of 8,000 paragraphs from 2,800 news articles, annotated at the text span level using a taxonomy of 23 propagandistic techniques. The study evaluates the performance of GPT-4 in detecting propaganda techniques and their manifestations in text. Results show that GPT-4's performance degrades as the task becomes more granular, and it struggles with span-level detection across multiple languages. The dataset and resources are released to the community for further research. The study also highlights the challenges of annotating propaganda techniques, including subjectivity, contextual variations, and linguistic nuances. The results indicate that fine-tuned models outperform GPT-4 in zero-shot settings, and that span-level detection is a complex task requiring careful annotation. The study contributes to the field of computational propaganda by providing a comprehensive dataset and insights into the performance of large language models in propaganda detection.