Understanding Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models

The paper addresses the challenge of detecting harmful memes on social media, which are often difficult to identify due to their implicit meanings. Traditional methods lack explainability, making it hard to understand why certain memes are deemed harmful. The authors propose an explainable approach called ExplainHM, which leverages Large Language Models (LLMs) to generate multimodal debates between harmless and harmful positions. This process helps in reasoning over conflicting rationales, enhancing the accuracy and explainability of harmful meme detection. The approach involves fine-tuning a small language model to judge the harmfulness of memes based on the multimodal fusion of text and image information. Extensive experiments on three public meme datasets show that ExplainHM outperforms state-of-the-art methods and provides more informative explanations for its predictions. The paper also includes ablation studies and human evaluation to validate the effectiveness of the proposed approach.The paper addresses the challenge of detecting harmful memes on social media, which are often difficult to identify due to their implicit meanings. Traditional methods lack explainability, making it hard to understand why certain memes are deemed harmful. The authors propose an explainable approach called ExplainHM, which leverages Large Language Models (LLMs) to generate multimodal debates between harmless and harmful positions. This process helps in reasoning over conflicting rationales, enhancing the accuracy and explainability of harmful meme detection. The approach involves fine-tuning a small language model to judge the harmfulness of memes based on the multimodal fusion of text and image information. Extensive experiments on three public meme datasets show that ExplainHM outperforms state-of-the-art methods and provides more informative explanations for its predictions. The paper also includes ablation studies and human evaluation to validate the effectiveness of the proposed approach.

Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models

May 13–17, 2024, Singapore | Hongzhan Lin1, Ziyang Luo1, Wei Gao2, Jing Ma1*, Bo Wang1,3, Ruichao Yang1