[slides and audio] Can GPT-4 Identify Propaganda%3F Annotation and Detection of Propaganda Spans in News Articles

The paper addresses the challenge of identifying propaganda in news articles, particularly in Arabic, by developing the largest annotated dataset to date, *ArPro*. The dataset includes 8K annotated paragraphs from 2.8K news articles, labeled at the text span level using a taxonomy of 23 propagandistic techniques. The study also evaluates the performance of GPT-4, a large language model (LLM), in detecting these techniques. Key findings include: 1. **Dataset Development**: The *ArPro* dataset is the largest to date, covering a wide range of topics and news sources. It was annotated using a two-tier taxonomy and a detailed annotation process to ensure accuracy and consistency. 2. **Model Performance**: GPT-4's performance degrades as the task moves from binary classification to fine-grained detection of specific techniques. Fine-tuned models consistently outperform GPT-4 in zero-shot settings, especially for tasks requiring fine-grained detection. 3. **Cross-Language Evaluation**: GPT-4 struggles with span detection in multiple languages, including less-resourced languages like Polish and Russian. 4. **Future Work**: The authors plan to explore the correlation between propaganda and other phenomena in news reporting, design more sophisticated models for propaganda spans detection, and investigate the capabilities of LLMs in zero-shot and few-shot settings. The paper provides valuable insights into the challenges of detecting propaganda in text and highlights the limitations of current LLMs in this domain. The *ArPro* dataset and resources are made available to the community to support future research in this area.The paper addresses the challenge of identifying propaganda in news articles, particularly in Arabic, by developing the largest annotated dataset to date, *ArPro*. The dataset includes 8K annotated paragraphs from 2.8K news articles, labeled at the text span level using a taxonomy of 23 propagandistic techniques. The study also evaluates the performance of GPT-4, a large language model (LLM), in detecting these techniques. Key findings include: 1. **Dataset Development**: The *ArPro* dataset is the largest to date, covering a wide range of topics and news sources. It was annotated using a two-tier taxonomy and a detailed annotation process to ensure accuracy and consistency. 2. **Model Performance**: GPT-4's performance degrades as the task moves from binary classification to fine-grained detection of specific techniques. Fine-tuned models consistently outperform GPT-4 in zero-shot settings, especially for tasks requiring fine-grained detection. 3. **Cross-Language Evaluation**: GPT-4 struggles with span detection in multiple languages, including less-resourced languages like Polish and Russian. 4. **Future Work**: The authors plan to explore the correlation between propaganda and other phenomena in news reporting, design more sophisticated models for propaganda spans detection, and investigate the capabilities of LLMs in zero-shot and few-shot settings. The paper provides valuable insights into the challenges of detecting propaganda in text and highlights the limitations of current LLMs in this domain. The *ArPro* dataset and resources are made available to the community to support future research in this area.

Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles

27 Feb 2024 | Maram Hasanain, Fatema Ahmed, Firoj Alam