Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models

Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models

28 Jun 2024 | Junfei Wu, Qiang Liu, Ding Wang, Jinghao Zhang, Shu Wu, Liang Wang, Tieniu Tan
The paper "Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models" addresses the issue of object hallucination in large vision-language models (LVLMs), which refers to the phenomenon where LVLMs generate descriptions of non-existent objects in images. The authors propose a novel framework called LogicCheckGPT, which leverages logical consistency probing to detect and mitigate object hallucinations. The framework involves five steps: object extraction, object-to-attribute inquiring, attribute-to-object inquiring, logical closed loop checking, and hallucination detection and mitigation. By asking logical questions about objects and their attributes, the framework can identify hallucinated objects based on the logical consistency of the model's responses. Comprehensive experiments on multiple benchmarks and LVLMs demonstrate the effectiveness and generality of the proposed method, showing significant improvements over existing approaches. The main contributions of the work include the first adoption of logical closed loops for object hallucination detection and mitigation, a training-free framework, and its superior performance across various LVLMs.The paper "Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models" addresses the issue of object hallucination in large vision-language models (LVLMs), which refers to the phenomenon where LVLMs generate descriptions of non-existent objects in images. The authors propose a novel framework called LogicCheckGPT, which leverages logical consistency probing to detect and mitigate object hallucinations. The framework involves five steps: object extraction, object-to-attribute inquiring, attribute-to-object inquiring, logical closed loop checking, and hallucination detection and mitigation. By asking logical questions about objects and their attributes, the framework can identify hallucinated objects based on the logical consistency of the model's responses. Comprehensive experiments on multiple benchmarks and LVLMs demonstrate the effectiveness and generality of the proposed method, showing significant improvements over existing approaches. The main contributions of the work include the first adoption of logical closed loops for object hallucination detection and mitigation, a training-free framework, and its superior performance across various LVLMs.
Reach us at info@study.space