Mitigating LLM Hallucinations via Conformal Abstention

Mitigating LLM Hallucinations via Conformal Abstention

May 6, 2024 | Yasin Abbasi-Yadkori; Ilja Kuzborskij*; David Stutz; András György; Adam Fisch; Arnaud Doucet; Iliyia Beloshapka; Wei-Hung Weng; Yao-Yuan Yang; Csaba Szepesvári; Ali Taylan Cemgil; Nenad Tomasev
This paper proposes a principled method for mitigating hallucinations in large language models (LLMs) by using conformal abstention. The method leverages the LLM itself to evaluate the similarity between its generated responses and uses conformal prediction techniques to determine when to abstain from providing an answer. The approach involves using a thresholded similarity function to assess whether two responses are equivalent, and calibrating this threshold using conformal prediction to ensure theoretical guarantees on the accuracy of the match prediction. The method is evaluated on various datasets, including Temporal Sequences and TriviaQA. On Temporal Sequences, which contains long answers, the conformal abstention method achieves a significantly lower abstention rate compared to baselines using log-probability scores. On TriviaQA, which contains short answers, the method performs comparably to baselines. The method also provides a way to calibrate the threshold for the similarity function, ensuring that the match prediction is accurate. The paper introduces a conformal abstention procedure that uses the LLM to self-evaluate the similarity between its responses and applies conformal prediction techniques to determine when to abstain. This approach ensures that the hallucination rate is bounded while maintaining a low abstention rate. The method is evaluated on multiple datasets and shows improved performance compared to baselines. The results demonstrate that the conformal abstention method effectively reduces hallucinations while maintaining a low abstention rate.This paper proposes a principled method for mitigating hallucinations in large language models (LLMs) by using conformal abstention. The method leverages the LLM itself to evaluate the similarity between its generated responses and uses conformal prediction techniques to determine when to abstain from providing an answer. The approach involves using a thresholded similarity function to assess whether two responses are equivalent, and calibrating this threshold using conformal prediction to ensure theoretical guarantees on the accuracy of the match prediction. The method is evaluated on various datasets, including Temporal Sequences and TriviaQA. On Temporal Sequences, which contains long answers, the conformal abstention method achieves a significantly lower abstention rate compared to baselines using log-probability scores. On TriviaQA, which contains short answers, the method performs comparably to baselines. The method also provides a way to calibrate the threshold for the similarity function, ensuring that the match prediction is accurate. The paper introduces a conformal abstention procedure that uses the LLM to self-evaluate the similarity between its responses and applies conformal prediction techniques to determine when to abstain. This approach ensures that the hallucination rate is bounded while maintaining a low abstention rate. The method is evaluated on multiple datasets and shows improved performance compared to baselines. The results demonstrate that the conformal abstention method effectively reduces hallucinations while maintaining a low abstention rate.
Reach us at info@study.space