20 May 2024 | Pranab Sahoo, Prabhash Meharia, Akash Ghosh, Sriparna Saha, Vinija Jain, Aman Chadha
This survey paper provides a comprehensive overview of the challenges and recent advancements in addressing hallucination in foundation models (FMs) across text, image, video, and audio modalities. Hallucination refers to the generation of content that appears plausible but lacks factual accuracy or context, posing significant risks in critical applications. The paper categorizes hallucinations into various types, including contextual disconnection, semantic distortion, content hallucination, and factual inaccuracy. It highlights the need for detection and mitigation strategies to ensure the reliability and accuracy of FMs.
The survey covers existing research in large language models (LLMs), large vision-language models (LVLMs), large video models (LMs), and large audio models (LAMs). Key contributions include the establishment of a structured taxonomy, identification of factors contributing to hallucination, and presentation of detection and mitigation techniques. The paper also discusses benchmark datasets and evaluation metrics used to assess hallucination detection and mitigation performance.
The authors emphasize the importance of addressing hallucination in critical domains such as healthcare, finance, and law, where reliable and accurate outputs are essential. They propose future directions, including data resources, automated evaluation, improving detection and mitigation techniques, and multimodal hallucination management. The paper concludes by highlighting the need for nuanced understanding and strategic management of hallucination capabilities to maximize the utility of large models while mitigating associated risks.This survey paper provides a comprehensive overview of the challenges and recent advancements in addressing hallucination in foundation models (FMs) across text, image, video, and audio modalities. Hallucination refers to the generation of content that appears plausible but lacks factual accuracy or context, posing significant risks in critical applications. The paper categorizes hallucinations into various types, including contextual disconnection, semantic distortion, content hallucination, and factual inaccuracy. It highlights the need for detection and mitigation strategies to ensure the reliability and accuracy of FMs.
The survey covers existing research in large language models (LLMs), large vision-language models (LVLMs), large video models (LMs), and large audio models (LAMs). Key contributions include the establishment of a structured taxonomy, identification of factors contributing to hallucination, and presentation of detection and mitigation techniques. The paper also discusses benchmark datasets and evaluation metrics used to assess hallucination detection and mitigation performance.
The authors emphasize the importance of addressing hallucination in critical domains such as healthcare, finance, and law, where reliable and accurate outputs are essential. They propose future directions, including data resources, automated evaluation, improving detection and mitigation techniques, and multimodal hallucination management. The paper concludes by highlighting the need for nuanced understanding and strategic management of hallucination capabilities to maximize the utility of large models while mitigating associated risks.