API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access

API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access

2024 | Jiayuan Su, Jing Luo, Hongwei Wang, Lu Cheng
This study addresses the challenge of uncertainty quantification in large language models (LLMs) without access to logits, a critical issue for responsible AI deployment. Conformal Prediction (CP), known for its model-agnostic and distribution-free properties, is proposed as a solution. However, existing CP methods for LLMs typically require logits, which are often unavailable or miscalibrated. To tackle this, the authors introduce a novel CP method tailored for API-only LLMs, minimizing prediction set size while ensuring statistical coverage guarantees. The core idea involves formulating nonconformity measures using both coarse-grained (sample frequency) and fine-grained uncertainty notions (normalized entropy and semantic similarity). Experimental results on both close-ended and open-ended Question Answering tasks demonstrate that the proposed method outperforms logit-based CP baselines, providing more efficient and accurate uncertainty estimates. The contributions include the first CP approach for LLMs without logit-access, a novel nonconformity score function, and empirical validation of its effectiveness.This study addresses the challenge of uncertainty quantification in large language models (LLMs) without access to logits, a critical issue for responsible AI deployment. Conformal Prediction (CP), known for its model-agnostic and distribution-free properties, is proposed as a solution. However, existing CP methods for LLMs typically require logits, which are often unavailable or miscalibrated. To tackle this, the authors introduce a novel CP method tailored for API-only LLMs, minimizing prediction set size while ensuring statistical coverage guarantees. The core idea involves formulating nonconformity measures using both coarse-grained (sample frequency) and fine-grained uncertainty notions (normalized entropy and semantic similarity). Experimental results on both close-ended and open-ended Question Answering tasks demonstrate that the proposed method outperforms logit-based CP baselines, providing more efficient and accurate uncertainty estimates. The contributions include the first CP approach for LLMs without logit-access, a novel nonconformity score function, and empirical validation of its effectiveness.
Reach us at info@study.space