IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding

IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding

28 Feb 2024 | Lanyun Zhu, Deyi Ji, Tianrun Chen, Peng Xu, Jieping Ye, Jun Liu
This paper addresses the issue of hallucinations in Large Vision-Language Models (LVLMs) by introducing a novel technique called Image-Biased Decoding (IBD). Hallucinations occur when LVLMs generate information that is either unrelated or incorrect based on the given text and image inputs. The proposed IBD technique aims to alleviate this problem by contrasting the predictions of a conventional LVLM with those of an image-biased LVLM. The image-biased model is created by modifying the attention weight matrix to emphasize visual information while minimizing textual influence. This approach amplifies the correct information highly correlated with image content and mitigates hallucinatory errors caused by excessive text dependence. The method is validated through statistical analysis and an adaptive adjustment strategy to handle varying conditions. Experimental results across multiple evaluation metrics demonstrate that IBD significantly reduces hallucinations in LVLMs, enhancing the truthfulness of generated responses with minimal additional parameters and data requirements.This paper addresses the issue of hallucinations in Large Vision-Language Models (LVLMs) by introducing a novel technique called Image-Biased Decoding (IBD). Hallucinations occur when LVLMs generate information that is either unrelated or incorrect based on the given text and image inputs. The proposed IBD technique aims to alleviate this problem by contrasting the predictions of a conventional LVLM with those of an image-biased LVLM. The image-biased model is created by modifying the attention weight matrix to emphasize visual information while minimizing textual influence. This approach amplifies the correct information highly correlated with image content and mitigates hallucinatory errors caused by excessive text dependence. The method is validated through statistical analysis and an adaptive adjustment strategy to handle varying conditions. Experimental results across multiple evaluation metrics demonstrate that IBD significantly reduces hallucinations in LVLMs, enhancing the truthfulness of generated responses with minimal additional parameters and data requirements.
Reach us at info@study.space
[slides] IBD%3A Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding | StudySpace