9 Mar 2024 | Dennis Ulmer, Martin Gubri, Hwaran Lee, Sangdoo Yun, Seong Joon Oh
The paper introduces APRICOT 🍎, a method for calibrating large language models (LLMs) using only their generated text. The goal is to build trust and maintain safety by accurately quantifying the model's confidence in its predictions. APRICOT 🍎 involves training an auxiliary model to predict the LLM's confidence based on its input and output text alone, without requiring access to the LLM's internal states or sequence likelihoods. This approach is conceptually simple, does not interfere with language generation, and has multiple potential applications, such as verbalizing uncertainty or adjusting responses based on confidence.
The method is evaluated on white-box and black-box LLMs using closed-book question-answering tasks. The results show that APRICOT 🍎 outperforms other calibration methods in terms of calibration error, misprediction AUROC, and Brier scores. The auxiliary model learns to infer the difficulty of the LLM's task from the type of question alone, and additional signals, such as chain-of-thought prompting and verbalized uncertainty, further improve performance.
The paper also discusses limitations, including the need for a sufficiently expressive sentence embedding model and the potential for distributional shift. Ethical considerations are addressed, emphasizing the importance of explicit validation and potential adjustments for out-of-distribution data. Overall, APRICOT 🍎 provides a robust and efficient solution for calibrating LLMs, enhancing their reliability and trustworthiness in practical applications.The paper introduces APRICOT 🍎, a method for calibrating large language models (LLMs) using only their generated text. The goal is to build trust and maintain safety by accurately quantifying the model's confidence in its predictions. APRICOT 🍎 involves training an auxiliary model to predict the LLM's confidence based on its input and output text alone, without requiring access to the LLM's internal states or sequence likelihoods. This approach is conceptually simple, does not interfere with language generation, and has multiple potential applications, such as verbalizing uncertainty or adjusting responses based on confidence.
The method is evaluated on white-box and black-box LLMs using closed-book question-answering tasks. The results show that APRICOT 🍎 outperforms other calibration methods in terms of calibration error, misprediction AUROC, and Brier scores. The auxiliary model learns to infer the difficulty of the LLM's task from the type of question alone, and additional signals, such as chain-of-thought prompting and verbalized uncertainty, further improve performance.
The paper also discusses limitations, including the need for a sufficiently expressive sentence embedding model and the potential for distributional shift. Ethical considerations are addressed, emphasizing the importance of explicit validation and potential adjustments for out-of-distribution data. Overall, APRICOT 🍎 provides a robust and efficient solution for calibrating LLMs, enhancing their reliability and trustworthiness in practical applications.