This paper introduces a unified framework for detecting hallucinations in multimodal large language models (MLLMs), called UNIHD, along with a meta-evaluation benchmark, MHaluBench. Hallucination refers to the generation of content that contradicts input data or established knowledge. MLLMs are prone to hallucinations, which hinder their practical deployment and contribute to the spread of misinformation. Detecting hallucinations in MLLMs is crucial for ensuring their reliability and safety.
The paper addresses the limitations of existing hallucination detection methods, which often focus on singular tasks, have limited categories of hallucinations, and lack granularity. To overcome these challenges, the authors propose MHaluBench, a benchmark that includes a diverse range of multimodal tasks and hallucination categories. MHaluBench is designed to evaluate the progress of hallucination detection methods and provides fine-grained analysis.
The UNIHD framework is a tool-augmented approach that leverages multiple auxiliary tools to detect hallucinations. It consists of four main components: essential claim extraction, autonomous tool selection via query formulation, parallel tool execution, and hallucination verification with rationales. The framework is designed to identify and verify hallucinations at different levels, including object, attribute, scene-text, and fact-based conflicts.
The authors conducted extensive experiments on MHaluBench, demonstrating the effectiveness of UNIHD in detecting hallucinations. The results show that UNIHD outperforms existing methods in both image-to-text and text-to-image generation tasks. The framework is also tool-agnostic, allowing for the integration of new tools and detection strategies to enhance hallucination verification.
The paper also discusses the challenges of hallucination detection in MLLMs, including the difficulty of detecting certain types of hallucinations and the need for more comprehensive benchmarks. The authors highlight the importance of developing robust and reliable hallucination detection methods to ensure the safety and effectiveness of MLLMs in practical applications.This paper introduces a unified framework for detecting hallucinations in multimodal large language models (MLLMs), called UNIHD, along with a meta-evaluation benchmark, MHaluBench. Hallucination refers to the generation of content that contradicts input data or established knowledge. MLLMs are prone to hallucinations, which hinder their practical deployment and contribute to the spread of misinformation. Detecting hallucinations in MLLMs is crucial for ensuring their reliability and safety.
The paper addresses the limitations of existing hallucination detection methods, which often focus on singular tasks, have limited categories of hallucinations, and lack granularity. To overcome these challenges, the authors propose MHaluBench, a benchmark that includes a diverse range of multimodal tasks and hallucination categories. MHaluBench is designed to evaluate the progress of hallucination detection methods and provides fine-grained analysis.
The UNIHD framework is a tool-augmented approach that leverages multiple auxiliary tools to detect hallucinations. It consists of four main components: essential claim extraction, autonomous tool selection via query formulation, parallel tool execution, and hallucination verification with rationales. The framework is designed to identify and verify hallucinations at different levels, including object, attribute, scene-text, and fact-based conflicts.
The authors conducted extensive experiments on MHaluBench, demonstrating the effectiveness of UNIHD in detecting hallucinations. The results show that UNIHD outperforms existing methods in both image-to-text and text-to-image generation tasks. The framework is also tool-agnostic, allowing for the integration of new tools and detection strategies to enhance hallucination verification.
The paper also discusses the challenges of hallucination detection in MLLMs, including the difficulty of detecting certain types of hallucinations and the need for more comprehensive benchmarks. The authors highlight the importance of developing robust and reliable hallucination detection methods to ensure the safety and effectiveness of MLLMs in practical applications.