Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models

Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models

28 Jun 2024 | Junfei Wu, Qiang Liu, Ding Wang, Jinghao Zhang, Shu Wu, Liang Wang, Tieniu Tan
This paper introduces LogicCheckGPT, a novel framework for detecting and mitigating object hallucinations in large vision-language models (LVLMs). Object hallucination refers to the phenomenon where LVLMs generate descriptions of non-existent objects in images. Current methods for mitigating hallucinations either require significant computational resources or rely on external models. LogicCheckGPT leverages the logical consistency of LVLM responses to identify and correct hallucinations. The framework involves five steps: object extraction, object-to-attribute inquiring, attribute-to-object inquiring, logical closed loop checking, and hallucination detection and mitigation. It is a plug-and-play method that can be applied to various LVLMs without training or external models. Comprehensive experiments on multiple benchmarks demonstrate that LogicCheckGPT significantly improves performance, achieving a 31.33% improvement on the POPE dataset for mPLUG-Owl and a 10.00% improvement for MiniGPT-4. The method is effective, generalizable, and interpretable, as it uses natural language interactions to enhance clarity. The framework is trained-free and relies on logical consistency to detect hallucinations, making it a promising solution for improving the reliability of LVLMs.This paper introduces LogicCheckGPT, a novel framework for detecting and mitigating object hallucinations in large vision-language models (LVLMs). Object hallucination refers to the phenomenon where LVLMs generate descriptions of non-existent objects in images. Current methods for mitigating hallucinations either require significant computational resources or rely on external models. LogicCheckGPT leverages the logical consistency of LVLM responses to identify and correct hallucinations. The framework involves five steps: object extraction, object-to-attribute inquiring, attribute-to-object inquiring, logical closed loop checking, and hallucination detection and mitigation. It is a plug-and-play method that can be applied to various LVLMs without training or external models. Comprehensive experiments on multiple benchmarks demonstrate that LogicCheckGPT significantly improves performance, achieving a 31.33% improvement on the POPE dataset for mPLUG-Owl and a 10.00% improvement for MiniGPT-4. The method is effective, generalizable, and interpretable, as it uses natural language interactions to enhance clarity. The framework is trained-free and relies on logical consistency to detect hallucinations, making it a promising solution for improving the reliability of LVLMs.
Reach us at info@study.space
Understanding Logical Closed Loop%3A Uncovering Object Hallucinations in Large Vision-Language Models