API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access

API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access

2024 | Jiayuan Su, Jing Luo, Hongwei Wang, Lu Cheng
This paper proposes a novel conformal prediction (CP) method for large language models (LLMs) without access to logit information, addressing the challenge of quantifying uncertainty in API-only LLMs. The method, named LofreeCP, leverages both coarse-grained and fine-grained uncertainty measures to construct nonconformity scores, enabling accurate prediction sets with small sizes and statistical guarantees. The coarse-grained measure is based on response frequency, while the fine-grained measures include normalized entropy (NE) and semantic similarity (SS). NE captures prompt-wise self-consistency, and SS measures response-wise similarity to the most frequent response within the same prompt. These measures help mitigate concentration issues in nonconformity scores, improving the efficiency and reliability of CP. The method is validated on both close-ended and open-ended question-answering tasks, demonstrating superior performance compared to existing logit-based and logit-free baselines. The results show that LofreeCP achieves better empirical coverage rates, smaller prediction set sizes, and more accurate uncertainty estimation. The approach is also tested on multiple LLMs, including Llama-2-7B, Llama-2-13B, WizardLMv1.2(13b), and Vicuna-v1.5(7b), showing consistent performance across different models and datasets. The method is theoretically proven to provide a rigorous statistical coverage guarantee, making it a promising solution for uncertainty quantification in LLMs without logit access.This paper proposes a novel conformal prediction (CP) method for large language models (LLMs) without access to logit information, addressing the challenge of quantifying uncertainty in API-only LLMs. The method, named LofreeCP, leverages both coarse-grained and fine-grained uncertainty measures to construct nonconformity scores, enabling accurate prediction sets with small sizes and statistical guarantees. The coarse-grained measure is based on response frequency, while the fine-grained measures include normalized entropy (NE) and semantic similarity (SS). NE captures prompt-wise self-consistency, and SS measures response-wise similarity to the most frequent response within the same prompt. These measures help mitigate concentration issues in nonconformity scores, improving the efficiency and reliability of CP. The method is validated on both close-ended and open-ended question-answering tasks, demonstrating superior performance compared to existing logit-based and logit-free baselines. The results show that LofreeCP achieves better empirical coverage rates, smaller prediction set sizes, and more accurate uncertainty estimation. The approach is also tested on multiple LLMs, including Llama-2-7B, Llama-2-13B, WizardLMv1.2(13b), and Vicuna-v1.5(7b), showing consistent performance across different models and datasets. The method is theoretically proven to provide a rigorous statistical coverage guarantee, making it a promising solution for uncertainty quantification in LLMs without logit access.
Reach us at info@study.space
[slides] API Is Enough%3A Conformal Prediction for Large Language Models Without Logit-Access | StudySpace