SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

5 Mar 2024 | Peng Qi, Zehong Yan, Wynne Hsu, Mong Li Lee
SNIPFER is a novel multimodal large language model designed for detecting and explaining out-of-context (OOC) misinformation. It improves upon existing methods by providing accurate and persuasive explanations, which are essential for debunking misinformation. SNIPFER uses two-stage instruction tuning on InstructBLIP, first refining the model's concept alignment with news-domain entities and then leveraging language-only GPT-4 generated instruction data to fine-tune the model's discriminatory powers. Enhanced by external tools and retrieval, SNIPFER not only detects inconsistencies between text and image but also utilizes external knowledge for contextual verification. Experiments show that SNIPFER surpasses the original MLLM by over 40% and outperforms state-of-the-art methods in detection accuracy. SNIPFER also provides accurate and persuasive explanations, validated by quantitative and human evaluations. SNIPFER addresses the challenge of detecting OOC misinformation, where authentic images are repurposed with false text. Unlike existing methods that focus on image-text consistency without explanations, SNIPFER analyzes both the consistency of the image-text content and the claim-evidence relevance, accurately identifying inconsistencies and providing precise explanations. It integrates internal and external verification methods, using external knowledge to enhance its detection capabilities. The model is trained on a large-scale dataset and demonstrates superior performance in detecting OOC misinformation, with a significant improvement in detection accuracy and explanation quality. SNIPFER's performance is evaluated on the NewsCLIPpings dataset, showing that it outperforms other baselines in detection accuracy and explanation quality. It also performs well on other datasets, demonstrating its generalizability. SNIPFER's ability to generate accurate and persuasive explanations is validated through both quantitative analysis and human evaluations. The model's performance is further compared with GPT-4V, showing that SNIPFER outperforms it in classification accuracy. SNIPFER's success in detecting OOC misinformation is attributed to its task-specific tuning and integration of external knowledge, which enable it to accurately identify inconsistencies between text and images.SNIPFER is a novel multimodal large language model designed for detecting and explaining out-of-context (OOC) misinformation. It improves upon existing methods by providing accurate and persuasive explanations, which are essential for debunking misinformation. SNIPFER uses two-stage instruction tuning on InstructBLIP, first refining the model's concept alignment with news-domain entities and then leveraging language-only GPT-4 generated instruction data to fine-tune the model's discriminatory powers. Enhanced by external tools and retrieval, SNIPFER not only detects inconsistencies between text and image but also utilizes external knowledge for contextual verification. Experiments show that SNIPFER surpasses the original MLLM by over 40% and outperforms state-of-the-art methods in detection accuracy. SNIPFER also provides accurate and persuasive explanations, validated by quantitative and human evaluations. SNIPFER addresses the challenge of detecting OOC misinformation, where authentic images are repurposed with false text. Unlike existing methods that focus on image-text consistency without explanations, SNIPFER analyzes both the consistency of the image-text content and the claim-evidence relevance, accurately identifying inconsistencies and providing precise explanations. It integrates internal and external verification methods, using external knowledge to enhance its detection capabilities. The model is trained on a large-scale dataset and demonstrates superior performance in detecting OOC misinformation, with a significant improvement in detection accuracy and explanation quality. SNIPFER's performance is evaluated on the NewsCLIPpings dataset, showing that it outperforms other baselines in detection accuracy and explanation quality. It also performs well on other datasets, demonstrating its generalizability. SNIPFER's ability to generate accurate and persuasive explanations is validated through both quantitative analysis and human evaluations. The model's performance is further compared with GPT-4V, showing that SNIPFER outperforms it in classification accuracy. SNIPFER's success in detecting OOC misinformation is attributed to its task-specific tuning and integration of external knowledge, which enable it to accurately identify inconsistencies between text and images.
Reach us at info@study.space
[slides and audio] Sniffer%3A Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection