Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach

Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach

30 May 2024 | Ernesto Quevedo, Jorge Yero Salazar, Rachel Koerner, Pablo Rivas, Tomas Cerny
This paper introduces a supervised learning approach for detecting hallucinations in large language models (LLMs). Hallucinations refer to outputs that are seemingly coherent but may be misleading, fictitious, or not reflective of training data or actual facts. The proposed method uses four numerical features derived from token and vocabulary probabilities obtained from other LLMs, which are not necessarily the same as the LLM that generated the text. These features are calculated based on the probabilities of tokens in the generated text according to a different LLM (LLM-Evaluator). The method employs two classifiers: a Logistic Regression (LR) and a Simple Neural Network (SNN), and achieves promising results across three benchmark datasets, surpassing state-of-the-art methods in multiple tasks. The four numerical features include the minimum token probability (mtp), average token probability (avgtp), maximum LLM-E probability deviation (Mpd), and minimum LLM-E probability spread (mps). These features are derived from the probabilities of tokens in the generated text according to a different LLM. The method is evaluated on three datasets: HaluEval, HELM, and True-False. The results show that the proposed method outperforms existing approaches in tasks such as summarization and question answering, and performs competitively in dialogue tasks. However, the method shows weaknesses in the True-False dataset, indicating the need for additional features to improve performance in such scenarios. The study highlights the importance of using different LLMs as evaluators, as they can provide more reliable indicators of hallucinations. The results also suggest that smaller LLMs can perform as well as larger ones in certain tasks. The method is compared to other approaches, including SelfCheckGPT and MIND, and is found to be competitive in some cases. The paper concludes that the proposed method is a promising approach for detecting hallucinations in LLMs, and further research is needed to improve its performance in various scenarios. The method is implemented using four numerical features derived from token and vocabulary probabilities, and is evaluated on three benchmark datasets. The results show that the method is effective in detecting hallucinations in LLM-generated text.This paper introduces a supervised learning approach for detecting hallucinations in large language models (LLMs). Hallucinations refer to outputs that are seemingly coherent but may be misleading, fictitious, or not reflective of training data or actual facts. The proposed method uses four numerical features derived from token and vocabulary probabilities obtained from other LLMs, which are not necessarily the same as the LLM that generated the text. These features are calculated based on the probabilities of tokens in the generated text according to a different LLM (LLM-Evaluator). The method employs two classifiers: a Logistic Regression (LR) and a Simple Neural Network (SNN), and achieves promising results across three benchmark datasets, surpassing state-of-the-art methods in multiple tasks. The four numerical features include the minimum token probability (mtp), average token probability (avgtp), maximum LLM-E probability deviation (Mpd), and minimum LLM-E probability spread (mps). These features are derived from the probabilities of tokens in the generated text according to a different LLM. The method is evaluated on three datasets: HaluEval, HELM, and True-False. The results show that the proposed method outperforms existing approaches in tasks such as summarization and question answering, and performs competitively in dialogue tasks. However, the method shows weaknesses in the True-False dataset, indicating the need for additional features to improve performance in such scenarios. The study highlights the importance of using different LLMs as evaluators, as they can provide more reliable indicators of hallucinations. The results also suggest that smaller LLMs can perform as well as larger ones in certain tasks. The method is compared to other approaches, including SelfCheckGPT and MIND, and is found to be competitive in some cases. The paper concludes that the proposed method is a promising approach for detecting hallucinations in LLMs, and further research is needed to improve its performance in various scenarios. The method is implemented using four numerical features derived from token and vocabulary probabilities, and is evaluated on three benchmark datasets. The results show that the method is effective in detecting hallucinations in LLM-generated text.
Reach us at info@study.space
[slides and audio] Detecting Hallucinations in Large Language Model Generation%3A A Token Probability Approach