28 Mar 2024 | Chen Ling, Xujiang Zhao, Xuchao Zhang, Wei Cheng, Yanchi Liu, Yiyou Sun, Mika Oishi, Takao Osaki, Katsushi Matsuda, Jie Ji, Guangji Bai, Liang Zhao, Haifeng Chen
This paper addresses the issue of uncertainty in large language models (LLMs) during in-context learning, a capability that has revolutionized various fields by enabling LLMs to understand and respond to tasks based on a few task-relevant demonstrations. The authors propose a novel framework to decompose the predictive uncertainty of LLMs into aleatoric and epistemic components. Aleatoric uncertainty arises from the quality of provided demonstrations, while epistemic uncertainty stems from ambiguities in the model's configurations. The framework uses a Bayesian approach to quantify these uncertainties and introduces an entropy-based estimation method to handle the free-form outputs of LLMs. Extensive experiments on natural language understanding tasks demonstrate the effectiveness of the proposed method, showing that it outperforms existing uncertainty quantification methods in assessing misclassification samples and detecting out-of-domain demonstrations. The paper also discusses the generalization capability of the method across different LLM sizes and highlights its robust performance in various datasets.This paper addresses the issue of uncertainty in large language models (LLMs) during in-context learning, a capability that has revolutionized various fields by enabling LLMs to understand and respond to tasks based on a few task-relevant demonstrations. The authors propose a novel framework to decompose the predictive uncertainty of LLMs into aleatoric and epistemic components. Aleatoric uncertainty arises from the quality of provided demonstrations, while epistemic uncertainty stems from ambiguities in the model's configurations. The framework uses a Bayesian approach to quantify these uncertainties and introduces an entropy-based estimation method to handle the free-form outputs of LLMs. Extensive experiments on natural language understanding tasks demonstrate the effectiveness of the proposed method, showing that it outperforms existing uncertainty quantification methods in assessing misclassification samples and detecting out-of-domain demonstrations. The paper also discusses the generalization capability of the method across different LLM sizes and highlights its robust performance in various datasets.