Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance

Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance

13 Feb 2024 | Linxi Zhao, Yihe Deng, Weitong Zhang, Quanquan Gu
The paper addresses the issue of object hallucination in Large Vision-Language Models (LVLMs), where these models generate descriptions of non-existent objects in images. Traditional solutions, such as fine-tuning with curated datasets or using advanced Language Models (LLMs) like GPT-3.5, are either costly or require API access. To tackle this, the authors introduce MARINE (Mitigating hallucinAtion via classifier-R-Free guIdaNeC), a training-free and API-free framework that enriches the visual context of LVLMs by integrating existing open-source vision models and employs classifier-free guidance to improve the precision of object descriptions. Through comprehensive evaluations on six popular LVLMs, MARINE demonstrates effectiveness in reducing hallucinations and improving the detailedness of generated responses, outperforming existing fine-tuning-based methods. The framework is compatible with any vision model and projection function, and its performance is further validated through ablation studies and examples illustrating the impact of guidance strength and noise intensity on hallucination reduction.The paper addresses the issue of object hallucination in Large Vision-Language Models (LVLMs), where these models generate descriptions of non-existent objects in images. Traditional solutions, such as fine-tuning with curated datasets or using advanced Language Models (LLMs) like GPT-3.5, are either costly or require API access. To tackle this, the authors introduce MARINE (Mitigating hallucinAtion via classifier-R-Free guIdaNeC), a training-free and API-free framework that enriches the visual context of LVLMs by integrating existing open-source vision models and employs classifier-free guidance to improve the precision of object descriptions. Through comprehensive evaluations on six popular LVLMs, MARINE demonstrates effectiveness in reducing hallucinations and improving the detailedness of generated responses, outperforming existing fine-tuning-based methods. The framework is compatible with any vision model and projection function, and its performance is further validated through ablation studies and examples illustrating the impact of guidance strength and noise intensity on hallucination reduction.
Reach us at info@study.space