[slides] Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models

This paper addresses the issue of hallucinations in large language models (LLMs), which refers to the phenomenon where LLMs produce coherent but factually inaccurate responses. To tackle this problem, the authors propose MIND, an unsupervised training framework that leverages the internal states of LLMs for real-time hallucination detection without requiring manual annotations. MIND is designed to be lightweight and efficient, making it suitable for real-time applications. Additionally, the authors introduce HELM, a new benchmark for evaluating hallucination detection across multiple LLMs, featuring diverse LLM outputs and the internal states of LLMs during their inference process. Experiments demonstrate that MIND outperforms existing state-of-the-art methods in hallucination detection. The paper also discusses the limitations of the approach and potential future work, including integrating the internal states of LLMs with their generated text to improve accuracy and reliability.This paper addresses the issue of hallucinations in large language models (LLMs), which refers to the phenomenon where LLMs produce coherent but factually inaccurate responses. To tackle this problem, the authors propose MIND, an unsupervised training framework that leverages the internal states of LLMs for real-time hallucination detection without requiring manual annotations. MIND is designed to be lightweight and efficient, making it suitable for real-time applications. Additionally, the authors introduce HELM, a new benchmark for evaluating hallucination detection across multiple LLMs, featuring diverse LLM outputs and the internal states of LLMs during their inference process. Experiments demonstrate that MIND outperforms existing state-of-the-art methods in hallucination detection. The paper also discusses the limitations of the approach and potential future work, including integrating the internal states of LLMs with their generated text to improve accuracy and reliability.

Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models

10 Jun 2024 | Weihang Su*, Changyue Wang†‡1, Qingyao Ai‡‡1, Yiran Hu1, Zhijing Wu2, Yujia Zhou3 Yiqun Liu1