[slides and audio] Distinguishing the Knowable from the Unknowable with Language Models

This paper explores the feasibility of identifying *epistemic* uncertainty (reflecting a lack of knowledge) from *aleatoric* uncertainty (reflecting entropy in the underlying distribution) in the outputs of large language models (LLMs) over free-form text. The authors propose a method to distinguish between these two types of uncertainty by comparing the outputs of a small, less powerful LLM with those of a larger, more powerful LLM. They find that small linear probes trained on the embeddings of frozen, pre-trained models can accurately predict when larger models will be more confident at the token level. These probes also generalize well across different text domains. Additionally, they propose a fully unsupervised method, the *In-Context Learning Test* (ICLT), which achieves non-trivial accuracy on the same task. The results suggest that LLMs naturally contain internal representations of different types of uncertainty, which could be leveraged to create more informative indicators of model confidence in various practical applications. The paper discusses the implications of these findings for improving the truthfulness of LLM-generated content and reducing hallucinations.This paper explores the feasibility of identifying *epistemic* uncertainty (reflecting a lack of knowledge) from *aleatoric* uncertainty (reflecting entropy in the underlying distribution) in the outputs of large language models (LLMs) over free-form text. The authors propose a method to distinguish between these two types of uncertainty by comparing the outputs of a small, less powerful LLM with those of a larger, more powerful LLM. They find that small linear probes trained on the embeddings of frozen, pre-trained models can accurately predict when larger models will be more confident at the token level. These probes also generalize well across different text domains. Additionally, they propose a fully unsupervised method, the *In-Context Learning Test* (ICLT), which achieves non-trivial accuracy on the same task. The results suggest that LLMs naturally contain internal representations of different types of uncertainty, which could be leveraged to create more informative indicators of model confidence in various practical applications. The paper discusses the implications of these findings for improving the truthfulness of LLM-generated content and reducing hallucinations.

Distinguishing the Knowable from the Unknowable with Language Models

27 Feb 2024 | Gustaf Ahdritz*1, Tian Qin*1, Nikhil Vyas1, Boaz Barak1, and Benjamin L. Edelman1

27 Feb 2024 | Gustaf Ahdritz1, Tian Qin1, Nikhil Vyas1, Boaz Barak1, and Benjamin L. Edelman1