Mitigating LLM Hallucinations via Conformal Abstention

Mitigating LLM Hallucinations via Conformal Abstention

May 6, 2024 | Yasin Abbasi-Yadkori, Ilja Kuzborskij, David Stutz, András György, Adam Fisch, Arnaud Doucet, Iuliya Beloshapka, Wei-Hung Weng, Yao-Yuan Yang, Csaba Szepesvári, Ali Taylan Cemgil, Nenad Tomasev
The paper "Mitigating LLM Hallucinations via Conformal Abstention" addresses the issue of hallucinations in large language models (LLMs), where models may generate incorrect or nonsensical responses. The authors propose a principled approach to determine when an LLM should abstain from responding, using self-evaluation of its sampled responses and conformal prediction techniques. The method aims to balance the abstention rate and the hallucination risk, ensuring that the model either produces a likely correct response or abstains entirely. The paper evaluates the method on various datasets, demonstrating its effectiveness in reducing hallucinations while maintaining a low abstention rate. Additionally, it introduces a calibration method for determining the threshold of response similarity, which provides theoretical guarantees on the accuracy of match predictions. The experiments show that the proposed abstention policy outperforms log-probability baselines, particularly on datasets with long responses.The paper "Mitigating LLM Hallucinations via Conformal Abstention" addresses the issue of hallucinations in large language models (LLMs), where models may generate incorrect or nonsensical responses. The authors propose a principled approach to determine when an LLM should abstain from responding, using self-evaluation of its sampled responses and conformal prediction techniques. The method aims to balance the abstention rate and the hallucination risk, ensuring that the model either produces a likely correct response or abstains entirely. The paper evaluates the method on various datasets, demonstrating its effectiveness in reducing hallucinations while maintaining a low abstention rate. Additionally, it introduces a calibration method for determining the threshold of response similarity, which provides theoretical guarantees on the accuracy of match predictions. The experiments show that the proposed abstention policy outperforms log-probability baselines, particularly on datasets with long responses.
Reach us at info@study.space