2024 | Zhaorun Chen * 1 Zhuokai Zhao * 1 Hongyin Luo 2 Huaxiu Yao 3 Bo Li 1 4 Jiawei Zhou 5
**HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding**
**Abstract:**
Large vision-language models (LVLMs) excel in interpreting multi-modal contexts but often suffer from object hallucinations (OH). HALC is a novel decoding algorithm designed to mitigate OH in LVLMs. It integrates an auto-focal grounding mechanism to correct hallucinated tokens and a specialized beam search algorithm to reduce OH while preserving text generation quality. HALC can be easily integrated into any LVLM without additional training. Extensive experiments demonstrate HALC's effectiveness, outperforming state-of-the-art methods across four benchmarks.
**Introduction:**
OH is a persistent challenge in VLMs, where models generate incorrect object descriptions. OH can be categorized into three types: existence, attribute, and relationship hallucinations. Existing methods often require powerful external LVLMs or additional data, limiting their adaptability. HALC addresses all three types of OH by employing an adaptive focal-contrast grounding mechanism and a matching-based beam search algorithm.
**Methodology:**
HALC operates at the token level, using fine-grained visual information to correct hallucinations. It identifies object-related tokens, retrieves visual contexts, and uses a focal-contrast grounding mechanism to adjust token probabilities. A matching-based beam search algorithm maintains text generation quality while reducing OH.
**Experiments:**
HALC is evaluated on benchmarks including CHAIR, POPE, MME, and LLaVA-Bench. Results show that HALC significantly reduces OH while maintaining high-quality text generation. It outperforms existing methods in terms of both quantitative metrics and qualitative assessments.
**Conclusion:**
HALC is a novel decoding algorithm that effectively reduces OH in LVLMs. It integrates fine-grained visual information and a specialized beam search algorithm, demonstrating superior performance and robustness. A benchmarking tool supports comprehensive comparisons across OH reduction strategies.**HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding**
**Abstract:**
Large vision-language models (LVLMs) excel in interpreting multi-modal contexts but often suffer from object hallucinations (OH). HALC is a novel decoding algorithm designed to mitigate OH in LVLMs. It integrates an auto-focal grounding mechanism to correct hallucinated tokens and a specialized beam search algorithm to reduce OH while preserving text generation quality. HALC can be easily integrated into any LVLM without additional training. Extensive experiments demonstrate HALC's effectiveness, outperforming state-of-the-art methods across four benchmarks.
**Introduction:**
OH is a persistent challenge in VLMs, where models generate incorrect object descriptions. OH can be categorized into three types: existence, attribute, and relationship hallucinations. Existing methods often require powerful external LVLMs or additional data, limiting their adaptability. HALC addresses all three types of OH by employing an adaptive focal-contrast grounding mechanism and a matching-based beam search algorithm.
**Methodology:**
HALC operates at the token level, using fine-grained visual information to correct hallucinations. It identifies object-related tokens, retrieves visual contexts, and uses a focal-contrast grounding mechanism to adjust token probabilities. A matching-based beam search algorithm maintains text generation quality while reducing OH.
**Experiments:**
HALC is evaluated on benchmarks including CHAIR, POPE, MME, and LLaVA-Bench. Results show that HALC significantly reduces OH while maintaining high-quality text generation. It outperforms existing methods in terms of both quantitative metrics and qualitative assessments.
**Conclusion:**
HALC is a novel decoding algorithm that effectively reduces OH in LVLMs. It integrates fine-grained visual information and a specialized beam search algorithm, demonstrating superior performance and robustness. A benchmarking tool supports comprehensive comparisons across OH reduction strategies.