HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

2024 | Zhaorun Chen * 1 Zhuokai Zhao * 1 Hongyin Luo 2 Huaxiu Yao 3 Bo Li 1 4 Jiawei Zhou 5
**HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding** **Abstract:** Large vision-language models (LVLMs) excel in interpreting multi-modal contexts but often suffer from object hallucinations (OH). HALC is a novel decoding algorithm designed to mitigate OH in LVLMs. It integrates an auto-focal grounding mechanism to correct hallucinated tokens and a specialized beam search algorithm to reduce OH while preserving text generation quality. HALC can be easily integrated into any LVLM without additional training. Extensive experiments demonstrate HALC's effectiveness, outperforming state-of-the-art methods across four benchmarks. **Introduction:** OH is a persistent challenge in VLMs, where models generate incorrect object descriptions. OH can be categorized into three types: existence, attribute, and relationship hallucinations. Existing methods often require powerful external LVLMs or additional data, limiting their adaptability. HALC addresses all three types of OH by employing an adaptive focal-contrast grounding mechanism and a matching-based beam search algorithm. **Methodology:** HALC operates at the token level, using fine-grained visual information to correct hallucinations. It identifies object-related tokens, retrieves visual contexts, and uses a focal-contrast grounding mechanism to adjust token probabilities. A matching-based beam search algorithm maintains text generation quality while reducing OH. **Experiments:** HALC is evaluated on benchmarks including CHAIR, POPE, MME, and LLaVA-Bench. Results show that HALC significantly reduces OH while maintaining high-quality text generation. It outperforms existing methods in terms of both quantitative metrics and qualitative assessments. **Conclusion:** HALC is a novel decoding algorithm that effectively reduces OH in LVLMs. It integrates fine-grained visual information and a specialized beam search algorithm, demonstrating superior performance and robustness. A benchmarking tool supports comprehensive comparisons across OH reduction strategies.**HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding** **Abstract:** Large vision-language models (LVLMs) excel in interpreting multi-modal contexts but often suffer from object hallucinations (OH). HALC is a novel decoding algorithm designed to mitigate OH in LVLMs. It integrates an auto-focal grounding mechanism to correct hallucinated tokens and a specialized beam search algorithm to reduce OH while preserving text generation quality. HALC can be easily integrated into any LVLM without additional training. Extensive experiments demonstrate HALC's effectiveness, outperforming state-of-the-art methods across four benchmarks. **Introduction:** OH is a persistent challenge in VLMs, where models generate incorrect object descriptions. OH can be categorized into three types: existence, attribute, and relationship hallucinations. Existing methods often require powerful external LVLMs or additional data, limiting their adaptability. HALC addresses all three types of OH by employing an adaptive focal-contrast grounding mechanism and a matching-based beam search algorithm. **Methodology:** HALC operates at the token level, using fine-grained visual information to correct hallucinations. It identifies object-related tokens, retrieves visual contexts, and uses a focal-contrast grounding mechanism to adjust token probabilities. A matching-based beam search algorithm maintains text generation quality while reducing OH. **Experiments:** HALC is evaluated on benchmarks including CHAIR, POPE, MME, and LLaVA-Bench. Results show that HALC significantly reduces OH while maintaining high-quality text generation. It outperforms existing methods in terms of both quantitative metrics and qualitative assessments. **Conclusion:** HALC is a novel decoding algorithm that effectively reduces OH in LVLMs. It integrates fine-grained visual information and a specialized beam search algorithm, demonstrating superior performance and robustness. A benchmarking tool supports comprehensive comparisons across OH reduction strategies.
Reach us at info@study.space
Understanding HALC%3A Object Hallucination Reduction via Adaptive Focal-Contrast Decoding