MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

21 Aug 2024 | Xuannan Liu, Zekun Li, Peipei Li, Shuhan Xia, Xing Cui, Linzhi Huang, Huaibo Huang, Weihong Deng, Zhaofeng He
MM FakeBench is a comprehensive benchmark designed for detecting mixed-source multimodal misinformation (MMD). It addresses the limitations of existing datasets, which often assume a single source and type of forgery, by incorporating three critical sources: textual veracity distortion, visual veracity distortion, and cross-modal consistency distortion. The benchmark includes 12 sub-categories of misinformation forgery types, totaling 11,000 data pairs. The paper evaluates 6 prevalent detection methods and 15 large vision-language models (LVLMs) on MM FakeBench under a zero-shot setting. The results indicate that current methods struggle with mixed-source MMD, highlighting the need for more robust detectors. To address this, the authors propose MMD-Agent, a unified framework that integrates rationales, actions, and tool-use capabilities of LVLM agents. This framework significantly enhances detection accuracy and generalization. Key contributions of the study include: 1. Introducing MM FakeBench, the first comprehensive benchmark for mixed-source MMD. 2. Conducting extensive evaluations of detection methods and LVLMs on MM FakeBench. 3. Proposing MMD-Agent, a framework that improves detection performance and serves as a new baseline for future research. The study aims to catalyze future research into more realistic mixed-source multimodal misinformation detection and provide a fair evaluation platform for misinformation detection methods.MM FakeBench is a comprehensive benchmark designed for detecting mixed-source multimodal misinformation (MMD). It addresses the limitations of existing datasets, which often assume a single source and type of forgery, by incorporating three critical sources: textual veracity distortion, visual veracity distortion, and cross-modal consistency distortion. The benchmark includes 12 sub-categories of misinformation forgery types, totaling 11,000 data pairs. The paper evaluates 6 prevalent detection methods and 15 large vision-language models (LVLMs) on MM FakeBench under a zero-shot setting. The results indicate that current methods struggle with mixed-source MMD, highlighting the need for more robust detectors. To address this, the authors propose MMD-Agent, a unified framework that integrates rationales, actions, and tool-use capabilities of LVLM agents. This framework significantly enhances detection accuracy and generalization. Key contributions of the study include: 1. Introducing MM FakeBench, the first comprehensive benchmark for mixed-source MMD. 2. Conducting extensive evaluations of detection methods and LVLMs on MM FakeBench. 3. Proposing MMD-Agent, a framework that improves detection performance and serves as a new baseline for future research. The study aims to catalyze future research into more realistic mixed-source multimodal misinformation detection and provide a fair evaluation platform for misinformation detection methods.
Reach us at info@study.space
[slides] MMFakeBench%3A A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs | StudySpace