1 Apr 2025 | ZECHEN BAI, Show Lab, National University of Singapore, Singapore; PICHAO WANG, Amazon AGI, USA; TIANJUN XIAO, AWS Shanghai AI Lab, China; TONG HE, AWS Shanghai AI Lab, China; ZONGBO HAN, Show Lab, National University of Singapore, Singapore; ZHENG ZHANG, AWS Shanghai AI Lab, China; MIKE ZHENG SHOU*, Show Lab, National University of Singapore, Singapore
This survey provides a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as large vision-language models (LVLMs). Despite their significant advancements and capabilities in multimodal tasks, MLLMs often generate outputs that are inconsistent with visual content, a challenge known as hallucination. The survey reviews recent advances in identifying, evaluating, and mitigating these hallucinations, offering insights into the underlying causes, evaluation benchmarks, metrics, and strategies. It also discusses current challenges and limitations, formulating open questions for future research. The survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field. Key contributions include a detailed classification of hallucinations, a review of evaluation benchmarks and metrics, and an analysis of mitigation strategies. The survey is organized into sections covering definitions, hallucination causes, metrics and benchmarks, and mitigation approaches. It concludes with a discussion of the current state and future directions in the field.This survey provides a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as large vision-language models (LVLMs). Despite their significant advancements and capabilities in multimodal tasks, MLLMs often generate outputs that are inconsistent with visual content, a challenge known as hallucination. The survey reviews recent advances in identifying, evaluating, and mitigating these hallucinations, offering insights into the underlying causes, evaluation benchmarks, metrics, and strategies. It also discusses current challenges and limitations, formulating open questions for future research. The survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field. Key contributions include a detailed classification of hallucinations, a review of evaluation benchmarks and metrics, and an analysis of mitigation strategies. The survey is organized into sections covering definitions, hallucination causes, metrics and benchmarks, and mitigation approaches. It concludes with a discussion of the current state and future directions in the field.