MM FakeBench is a comprehensive benchmark designed for detecting mixed-source multimodal misinformation (MMD). It addresses the limitations of existing datasets, which often assume a single source and type of forgery, by incorporating three critical sources: textual veracity distortion, visual veracity distortion, and cross-modal consistency distortion. The benchmark includes 12 sub-categories of misinformation forgery types, totaling 11,000 data pairs.
The paper evaluates 6 prevalent detection methods and 15 large vision-language models (LVLMs) on MM FakeBench under a zero-shot setting. The results indicate that current methods struggle with mixed-source MMD, highlighting the need for more robust detectors. To address this, the authors propose MMD-Agent, a unified framework that integrates rationales, actions, and tool-use capabilities of LVLM agents. This framework significantly enhances detection accuracy and generalization.
Key contributions of the study include:
1. Introducing MM FakeBench, the first comprehensive benchmark for mixed-source MMD.
2. Conducting extensive evaluations of detection methods and LVLMs on MM FakeBench.
3. Proposing MMD-Agent, a framework that improves detection performance and serves as a new baseline for future research.
The study aims to catalyze future research into more realistic mixed-source multimodal misinformation detection and provide a fair evaluation platform for misinformation detection methods.MM FakeBench is a comprehensive benchmark designed for detecting mixed-source multimodal misinformation (MMD). It addresses the limitations of existing datasets, which often assume a single source and type of forgery, by incorporating three critical sources: textual veracity distortion, visual veracity distortion, and cross-modal consistency distortion. The benchmark includes 12 sub-categories of misinformation forgery types, totaling 11,000 data pairs.
The paper evaluates 6 prevalent detection methods and 15 large vision-language models (LVLMs) on MM FakeBench under a zero-shot setting. The results indicate that current methods struggle with mixed-source MMD, highlighting the need for more robust detectors. To address this, the authors propose MMD-Agent, a unified framework that integrates rationales, actions, and tool-use capabilities of LVLM agents. This framework significantly enhances detection accuracy and generalization.
Key contributions of the study include:
1. Introducing MM FakeBench, the first comprehensive benchmark for mixed-source MMD.
2. Conducting extensive evaluations of detection methods and LVLMs on MM FakeBench.
3. Proposing MMD-Agent, a framework that improves detection performance and serves as a new baseline for future research.
The study aims to catalyze future research into more realistic mixed-source multimodal misinformation detection and provide a fair evaluation platform for misinformation detection methods.