To Believe or Not to Believe Your LLM

To Believe or Not to Believe Your LLM

July 18, 2024 | Yasin Abbasi Yadkori, Ilja Kuzborskij, András György, Csaba Szepesvári
This paper explores uncertainty quantification in large language models (LLMs), focusing on distinguishing between epistemic and aleatoric uncertainties. Epistemic uncertainty arises from a lack of knowledge about the ground truth, while aleatoric uncertainty comes from irreducible randomness in the prediction process. The authors propose an information-theoretic metric to detect when epistemic uncertainty is high, indicating potential hallucinations in model responses. This metric is derived from the joint distribution of responses generated through iterative prompting, allowing for the detection of hallucinations in both single- and multi-answer scenarios. The paper demonstrates that by iteratively prompting an LLM with previous responses, the model's joint distribution over responses can be approximated, enabling the quantification of epistemic uncertainty. This approach is contrasted with traditional methods that rely on log-likelihood or entropy, which are less effective in detecting hallucinations in multi-answer cases. The authors also show that their method can be used to detect hallucinations in multi-label queries, where multiple correct answers exist. The paper introduces a computable lower bound on the mutual information (MI) between the LLM's joint distribution and the ground truth, which can be estimated using a finite-sample MI estimator. This estimator is shown to be effective in detecting hallucinations, outperforming baselines such as the likelihood of the response and entropy-based methods, especially in datasets with both single- and multi-label samples. The authors also discuss the mechanism behind how iterative prompting affects LLM outputs, showing that repeated incorrect responses can influence the model's predictions. This is analyzed through the lens of transformer architecture, where the model's attention mechanism can be affected by the context provided in the prompt. The paper concludes with experiments on various question-answering benchmarks, demonstrating the effectiveness of their method in detecting hallucinations. The results show that their MI-based approach outperforms traditional baselines, particularly in scenarios with high epistemic uncertainty. The method is shown to be robust and effective in both single- and multi-label settings, providing a reliable way to assess the reliability of LLM responses.This paper explores uncertainty quantification in large language models (LLMs), focusing on distinguishing between epistemic and aleatoric uncertainties. Epistemic uncertainty arises from a lack of knowledge about the ground truth, while aleatoric uncertainty comes from irreducible randomness in the prediction process. The authors propose an information-theoretic metric to detect when epistemic uncertainty is high, indicating potential hallucinations in model responses. This metric is derived from the joint distribution of responses generated through iterative prompting, allowing for the detection of hallucinations in both single- and multi-answer scenarios. The paper demonstrates that by iteratively prompting an LLM with previous responses, the model's joint distribution over responses can be approximated, enabling the quantification of epistemic uncertainty. This approach is contrasted with traditional methods that rely on log-likelihood or entropy, which are less effective in detecting hallucinations in multi-answer cases. The authors also show that their method can be used to detect hallucinations in multi-label queries, where multiple correct answers exist. The paper introduces a computable lower bound on the mutual information (MI) between the LLM's joint distribution and the ground truth, which can be estimated using a finite-sample MI estimator. This estimator is shown to be effective in detecting hallucinations, outperforming baselines such as the likelihood of the response and entropy-based methods, especially in datasets with both single- and multi-label samples. The authors also discuss the mechanism behind how iterative prompting affects LLM outputs, showing that repeated incorrect responses can influence the model's predictions. This is analyzed through the lens of transformer architecture, where the model's attention mechanism can be affected by the context provided in the prompt. The paper concludes with experiments on various question-answering benchmarks, demonstrating the effectiveness of their method in detecting hallucinations. The results show that their MI-based approach outperforms traditional baselines, particularly in scenarios with high epistemic uncertainty. The method is shown to be robust and effective in both single- and multi-label settings, providing a reliable way to assess the reliability of LLM responses.
Reach us at info@study.space
[slides] To Believe or Not to Believe Your LLM | StudySpace