[slides and audio] Unified Hallucination Detection for Multimodal Large Language Models

The paper addresses the critical issue of hallucination detection in Multimodal Large Language Models (MLLMs), which can generate content that contradicts input data or established world knowledge. To tackle this problem, the authors propose a novel meta-evaluation benchmark, MHaluBench, and a unified multimodal hallucination detection framework, UNIHD. MHaluBench is designed to assess advancements in hallucination detection methods by encompassing various hallucination categories and multimodal tasks. UNIHD leverages a suite of auxiliary tools to robustly validate the occurrence of hallucinations. The framework includes essential claim extraction, autonomous tool selection, parallel tool execution, and hallucination verification with rationales. Experimental results demonstrate the effectiveness of UNIHD, showing superior performance across both image-to-text and text-to-image generation tasks. The paper also discusses limitations and future directions, emphasizing the need for further research to expand the scope of hallucination detection and improve tool accuracy.The paper addresses the critical issue of hallucination detection in Multimodal Large Language Models (MLLMs), which can generate content that contradicts input data or established world knowledge. To tackle this problem, the authors propose a novel meta-evaluation benchmark, MHaluBench, and a unified multimodal hallucination detection framework, UNIHD. MHaluBench is designed to assess advancements in hallucination detection methods by encompassing various hallucination categories and multimodal tasks. UNIHD leverages a suite of auxiliary tools to robustly validate the occurrence of hallucinations. The framework includes essential claim extraction, autonomous tool selection, parallel tool execution, and hallucination verification with rationales. Experimental results demonstrate the effectiveness of UNIHD, showing superior performance across both image-to-text and text-to-image generation tasks. The paper also discusses limitations and future directions, emphasizing the need for further research to expand the scope of hallucination detection and improve tool accuracy.

Unified Hallucination Detection for Multimodal Large Language Models

27 May 2024 | Xiang Chen, Chenxi Wang, Yida Xue, Ningyu Zhang, Xiaoyan Yang, Qiang Li, Yue Shen, Lei Liang, Jinjie Gu, Huajun Chen