3 Jul 2024 | Ziwei Ji, Delong Chen, Etsuko Ishii, Samuel Cahyawijaya, Yejin Bang, Bryan Wile, Pascale Fung
The paper investigates whether Large Language Models (LLMs) can estimate their own hallucination risk before generating responses. Inspired by human self-awareness, the authors analyze the internal mechanisms of LLMs across 15 diverse Natural Language Generation (NLG) tasks and over 700 datasets. They find that LLM internal states indicate whether the query has been seen in training data and whether the LLM is likely to hallucinate. The study explores specific neurons, activation layers, and tokens that play a crucial role in LLMs' perception of uncertainty and hallucination risk. Using a probing estimator, the authors achieve an average hallucination estimation accuracy of 84.32% at runtime. The research highlights the potential for LLMs to provide proactive uncertainty estimates, which could serve as an early indicator for retrieval augmentation or a warning system. The findings contribute to understanding and mitigating hallucinations in LLMs, enhancing their reliability and trustworthiness.The paper investigates whether Large Language Models (LLMs) can estimate their own hallucination risk before generating responses. Inspired by human self-awareness, the authors analyze the internal mechanisms of LLMs across 15 diverse Natural Language Generation (NLG) tasks and over 700 datasets. They find that LLM internal states indicate whether the query has been seen in training data and whether the LLM is likely to hallucinate. The study explores specific neurons, activation layers, and tokens that play a crucial role in LLMs' perception of uncertainty and hallucination risk. Using a probing estimator, the authors achieve an average hallucination estimation accuracy of 84.32% at runtime. The research highlights the potential for LLMs to provide proactive uncertainty estimates, which could serve as an early indicator for retrieval augmentation or a warning system. The findings contribute to understanding and mitigating hallucinations in LLMs, enhancing their reliability and trustworthiness.