MMFakeBench is a comprehensive benchmark for detecting mixed-source multimodal misinformation, addressing the limitations of existing single-source methods. The benchmark includes three critical sources of misinformation: textual veracity distortion, visual veracity distortion, and cross-modal consistency distortion, along with 12 sub-categories of forgery types. It contains 11,000 image-text pairs, including 3,300 real data pairs, and is divided into validation and test sets. The benchmark evaluates 6 detection methods and 15 large vision-language models (LVLMs) under a zero-shot setting, revealing the challenges of mixed-source detection. The results show that current methods struggle with this complex task, highlighting the need for improved detection capabilities. To address this, the paper proposes MMD-Agent, a unified framework that integrates rationales, actions, and tool-use capabilities of LVLM agents. MMD-Agent decomposes mixed-source detection into three stages: textual veracity check, visual veracity check, and cross-modal consistency reasoning. This framework significantly enhances detection accuracy and generalization. The study demonstrates that MMD-Agent outperforms existing methods and LVLMs on the MMFakeBench benchmark, providing a new baseline for future research. The benchmark and framework offer a realistic evaluation of misinformation detection methods and highlight the potential of mixed-source multimodal misinformation detection.MMFakeBench is a comprehensive benchmark for detecting mixed-source multimodal misinformation, addressing the limitations of existing single-source methods. The benchmark includes three critical sources of misinformation: textual veracity distortion, visual veracity distortion, and cross-modal consistency distortion, along with 12 sub-categories of forgery types. It contains 11,000 image-text pairs, including 3,300 real data pairs, and is divided into validation and test sets. The benchmark evaluates 6 detection methods and 15 large vision-language models (LVLMs) under a zero-shot setting, revealing the challenges of mixed-source detection. The results show that current methods struggle with this complex task, highlighting the need for improved detection capabilities. To address this, the paper proposes MMD-Agent, a unified framework that integrates rationales, actions, and tool-use capabilities of LVLM agents. MMD-Agent decomposes mixed-source detection into three stages: textual veracity check, visual veracity check, and cross-modal consistency reasoning. This framework significantly enhances detection accuracy and generalization. The study demonstrates that MMD-Agent outperforms existing methods and LVLMs on the MMFakeBench benchmark, providing a new baseline for future research. The benchmark and framework offer a realistic evaluation of misinformation detection methods and highlight the potential of mixed-source multimodal misinformation detection.