AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

16 Jun 2024 | Xiyang Wu, Tianrui Guan, Dianqi Li, Shuaibai Huang, Xiaoyu Liu, Xijun Wang, Ruiqi Xian, Abhinav Shrivastava, Furong Huang, Jordan Lee Boyd-Graber, Tianyi Zhou, Dinesh Manocha
AUTOHALLUSION is an automatic benchmark generation approach for vision-language models (LVLMs) to create diverse hallucination examples. It uses three strategies to induce hallucinations: abnormal object insertion, paired object insertion, and correlated object removal. These strategies probe LVLMs' language modules for context cues and synthesize images by adding abnormal objects, keeping one object while excluding another, or removing objects closely tied to the context. It then generates image-based questions whose ground-truth answers contradict the language module's prior. The model must overcome contextual biases and distractions to reach correct answers, while incorrect or inconsistent answers indicate hallucinations. AUTOHALLUSION enables the creation of new benchmarks with minimal human effort, overcoming the fragility of hand-crafted benchmarks. It also reveals common failure patterns and reasons, providing insights to detect, avoid, or control hallucinations. Comprehensive evaluations of top-tier LVLMs, including GPT-4V, Gemini Pro Vision, Claude 3, and LLaVA-1.5, show a 97.7% and 98.7% success rate of hallucination induction on synthetic and real-world datasets, paving the way for a long battle against hallucinations. The method is inspired by cognitive science's schema theory, which refers to the tendency of humans to organize information based on past experiences. The three strategies reveal common patterns and mechanisms of how hallucinations are generated, providing critical insights to detect, combat, avoid, or control hallucinations in LVLMs. The approach involves scene generation, image manipulation, question construction, and hallucination detection. Questions are constructed to induce potential hallucination cases, mainly focusing on object existence and spatial relations. Hallucinations are detected through correctness and consistency among answers generated by the victim LVLM. The method achieves high success rates in inducing hallucinations on both synthetic and real-world data. The results show that strategies probing inserted objects achieve higher hallucination attack success rates than those probing absent objects. Questions probing the existence of objects are more effective to cause hallucinations than those probing spatial relations. GPT-4V-Turbo is the most robust to hallucination attacks among all victim LVLMs. The method achieves even higher attack success rates across all LVLMs in the real-world dataset than synthetic data. The results demonstrate that using sequences of questions to probe hallucinations with varying contextual information from the image effectively disrupts the cognitive processing of LVLMs, showing superior results compared to strategies that involve object removal to induce expectation violations in LVLMs. The method is evaluated using two metrics: Manipulation Attack Success Rate (MASR) and Conflict Attack Success Rate (CASR). The results show that the method is effective in inducing hallucinations in LVLMs. The method is also compared with existing benchmarks, which differ from previous benchmarks by using an auto-generated hallucination approach, synthesizing visual hallucination cases through contextual influences. The method is limited by the object insertion strategyAUTOHALLUSION is an automatic benchmark generation approach for vision-language models (LVLMs) to create diverse hallucination examples. It uses three strategies to induce hallucinations: abnormal object insertion, paired object insertion, and correlated object removal. These strategies probe LVLMs' language modules for context cues and synthesize images by adding abnormal objects, keeping one object while excluding another, or removing objects closely tied to the context. It then generates image-based questions whose ground-truth answers contradict the language module's prior. The model must overcome contextual biases and distractions to reach correct answers, while incorrect or inconsistent answers indicate hallucinations. AUTOHALLUSION enables the creation of new benchmarks with minimal human effort, overcoming the fragility of hand-crafted benchmarks. It also reveals common failure patterns and reasons, providing insights to detect, avoid, or control hallucinations. Comprehensive evaluations of top-tier LVLMs, including GPT-4V, Gemini Pro Vision, Claude 3, and LLaVA-1.5, show a 97.7% and 98.7% success rate of hallucination induction on synthetic and real-world datasets, paving the way for a long battle against hallucinations. The method is inspired by cognitive science's schema theory, which refers to the tendency of humans to organize information based on past experiences. The three strategies reveal common patterns and mechanisms of how hallucinations are generated, providing critical insights to detect, combat, avoid, or control hallucinations in LVLMs. The approach involves scene generation, image manipulation, question construction, and hallucination detection. Questions are constructed to induce potential hallucination cases, mainly focusing on object existence and spatial relations. Hallucinations are detected through correctness and consistency among answers generated by the victim LVLM. The method achieves high success rates in inducing hallucinations on both synthetic and real-world data. The results show that strategies probing inserted objects achieve higher hallucination attack success rates than those probing absent objects. Questions probing the existence of objects are more effective to cause hallucinations than those probing spatial relations. GPT-4V-Turbo is the most robust to hallucination attacks among all victim LVLMs. The method achieves even higher attack success rates across all LVLMs in the real-world dataset than synthetic data. The results demonstrate that using sequences of questions to probe hallucinations with varying contextual information from the image effectively disrupts the cognitive processing of LVLMs, showing superior results compared to strategies that involve object removal to induce expectation violations in LVLMs. The method is evaluated using two metrics: Manipulation Attack Success Rate (MASR) and Conflict Attack Success Rate (CASR). The results show that the method is effective in inducing hallucinations in LVLMs. The method is also compared with existing benchmarks, which differ from previous benchmarks by using an auto-generated hallucination approach, synthesizing visual hallucination cases through contextual influences. The method is limited by the object insertion strategy
Reach us at info@study.space
[slides and audio] AUTOHALLUSION%3A Automatic Generation of Hallucination Benchmarks for Vision-Language Models