30 May 2024 | Ernesto Quevedo, Jorge Yero Salazar, Rachel Koerner, Pablo Rivas, Tomas Cerny
This paper addresses the issue of hallucinations in Large Language Models (LLMs) by proposing a supervised learning approach to detect hallucinations in generated text. The method uses two simple classifiers—Logistic Regression (LR) and a Simple Neural Network (SNN)—and four numerical features derived from token and vocabulary probabilities obtained from other LLM evaluators. The approach is evaluated across three benchmarks: HaluEval, HELM, and True-False, demonstrating promising results that surpass state-of-the-art methods in multiple tasks. The study highlights the importance of feature selection and the choice of LLM evaluators, showing that different LLMs can provide more reliable indicators of hallucinations. The research also explores the impact of feature importance and the limitations of the approach, particularly in the True-False dataset. The paper concludes with recommendations for future work, including the exploration of hybrid methods and ensemble learning techniques to enhance hallucination detection.This paper addresses the issue of hallucinations in Large Language Models (LLMs) by proposing a supervised learning approach to detect hallucinations in generated text. The method uses two simple classifiers—Logistic Regression (LR) and a Simple Neural Network (SNN)—and four numerical features derived from token and vocabulary probabilities obtained from other LLM evaluators. The approach is evaluated across three benchmarks: HaluEval, HELM, and True-False, demonstrating promising results that surpass state-of-the-art methods in multiple tasks. The study highlights the importance of feature selection and the choice of LLM evaluators, showing that different LLMs can provide more reliable indicators of hallucinations. The research also explores the impact of feature importance and the limitations of the approach, particularly in the True-False dataset. The paper concludes with recommendations for future work, including the exploration of hybrid methods and ensemble learning techniques to enhance hallucination detection.