[slides] INSIDE%3A LLMs' Internal States Retain the Power of Hallucination Detection

The paper addresses the issue of knowledge hallucination in Large Language Models (LLMs), which can lead to unreliable and incorrect outputs. To detect such hallucinations, the authors propose a method called Internal States for Hallucination Detection (INSIDE). INSIDE leverages the dense semantic information retained within LLMs' internal states to evaluate the self-consistency of responses. Specifically, it introduces an EigenScore metric that measures the semantic consistency/diversity in the embedding space using the eigenvalues of the covariance matrix of sentence embeddings. Additionally, a test-time feature clipping approach is proposed to truncate extreme activations in the internal states, reducing overconfident generations and improving hallucination detection. Extensive experiments on various QA benchmarks demonstrate the effectiveness of the proposed method, achieving state-of-the-art performance in hallucination detection. The paper also includes ablation studies and comparisons with other methods, highlighting the advantages of the EigenScore metric and the feature clipping technique.The paper addresses the issue of knowledge hallucination in Large Language Models (LLMs), which can lead to unreliable and incorrect outputs. To detect such hallucinations, the authors propose a method called Internal States for Hallucination Detection (INSIDE). INSIDE leverages the dense semantic information retained within LLMs' internal states to evaluate the self-consistency of responses. Specifically, it introduces an EigenScore metric that measures the semantic consistency/diversity in the embedding space using the eigenvalues of the covariance matrix of sentence embeddings. Additionally, a test-time feature clipping approach is proposed to truncate extreme activations in the internal states, reducing overconfident generations and improving hallucination detection. Extensive experiments on various QA benchmarks demonstrate the effectiveness of the proposed method, achieving state-of-the-art performance in hallucination detection. The paper also includes ablation studies and comparisons with other methods, highlighting the advantages of the EigenScore metric and the feature clipping technique.

INSIDE: LLMs’ INTERNAL STATES RETAIN THE POWER OF HALLUCINATION DETECTION

6 Feb 2024 | Chao Chen1, Kai Liu2, Ze Chen1, Yi Gu1, Yue Wu1, Mingyuan Tao1 Zhihang Fu*, Jieping Ye1