FKA-Owl: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs

FKA-Owl: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs

October 28-November 1, 2024 | Xuannan Liu, Peipei Li*, Huaibo Huang, Zekun Li, Xing Cui, Jiahao Liang, Lixiong Qin, Weihong Deng, Zhaofeng He
FKA-Owl is a novel framework that enhances Large Vision-Language Models (LVLMs) with forgery-specific knowledge to improve multimodal fake news detection. The framework incorporates two types of forgery-specific knowledge: semantic correlation between text and images, and artifact trace in image manipulation. These knowledge types are integrated into LVLMs through two specialized modules: a cross-modal reasoning module and a visual-artifact localization module. The cross-modal reasoning module uses dual-branch cross-attention mechanisms to extract semantic correlations, while the visual-artifact localization module detects precise visual artifacts using sparse bounding boxes and detailed mask regions. The encoded knowledge is then mapped to the language space of LVLMs for deep manipulation reasoning. Extensive experiments on public benchmarks show that FKA-Owl achieves superior cross-domain performance compared to previous methods. The framework leverages the rich world knowledge of LVLMs and enhances it with domain-specific knowledge crucial for identifying multimodal fake news. The results demonstrate the effectiveness of the proposed method in handling domain shift and improving the detection of fake news across different domains.FKA-Owl is a novel framework that enhances Large Vision-Language Models (LVLMs) with forgery-specific knowledge to improve multimodal fake news detection. The framework incorporates two types of forgery-specific knowledge: semantic correlation between text and images, and artifact trace in image manipulation. These knowledge types are integrated into LVLMs through two specialized modules: a cross-modal reasoning module and a visual-artifact localization module. The cross-modal reasoning module uses dual-branch cross-attention mechanisms to extract semantic correlations, while the visual-artifact localization module detects precise visual artifacts using sparse bounding boxes and detailed mask regions. The encoded knowledge is then mapped to the language space of LVLMs for deep manipulation reasoning. Extensive experiments on public benchmarks show that FKA-Owl achieves superior cross-domain performance compared to previous methods. The framework leverages the rich world knowledge of LVLMs and enhances it with domain-specific knowledge crucial for identifying multimodal fake news. The results demonstrate the effectiveness of the proposed method in handling domain shift and improving the detection of fake news across different domains.
Reach us at info@study.space