Understanding Hallucination of Multimodal Large Language Models%3A A Survey

This survey provides a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as large vision-language models (LVLMs). Despite their significant advancements and capabilities in multimodal tasks, MLLMs often generate outputs that are inconsistent with visual content, a challenge known as hallucination. The survey reviews recent advances in identifying, evaluating, and mitigating these hallucinations, offering insights into the underlying causes, evaluation benchmarks, metrics, and strategies. It also discusses current challenges and limitations, formulating open questions for future research. The survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field. Key contributions include a detailed classification of hallucinations, a review of evaluation benchmarks and metrics, and an analysis of mitigation strategies. The survey is organized into sections covering definitions, hallucination causes, metrics and benchmarks, and mitigation approaches. It concludes with a discussion of the current state and future directions in the field.This survey provides a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as large vision-language models (LVLMs). Despite their significant advancements and capabilities in multimodal tasks, MLLMs often generate outputs that are inconsistent with visual content, a challenge known as hallucination. The survey reviews recent advances in identifying, evaluating, and mitigating these hallucinations, offering insights into the underlying causes, evaluation benchmarks, metrics, and strategies. It also discusses current challenges and limitations, formulating open questions for future research. The survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field. Key contributions include a detailed classification of hallucinations, a review of evaluation benchmarks and metrics, and an analysis of mitigation strategies. The survey is organized into sections covering definitions, hallucination causes, metrics and benchmarks, and mitigation approaches. It concludes with a discussion of the current state and future directions in the field.

Hallucination of Multimodal Large Language Models: A Survey