Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective

Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective

2024 | Fabian Falck, Ziyu Wang, Chris Holmes
This paper investigates whether in-context learning (ICL) in large language models (LLMs) is Bayesian, using the martingale property as a key criterion. The martingale property, a fundamental requirement of Bayesian learning systems for exchangeable data, ensures that predictive distributions remain invariant to the order of data. The authors show that this property is necessary for unambiguous predictions and a principled notion of uncertainty in LLMs. They derive actionable diagnostics and test statistics to check if the martingale property holds, and find that state-of-the-art LLMs such as Llama2, Mistral, GPT-3.5, and GPT-4 violate this property in certain settings, indicating that ICL is not Bayesian. The study also examines how uncertainty in LLMs behaves with increasing data, finding deviations from Bayesian scaling. The results suggest that ICL in LLMs does not follow a principled Bayesian framework, and that the martingale property is crucial for ensuring reliable and interpretable predictions in safety-critical applications. The findings have important implications for the development of trustworthy AI systems.This paper investigates whether in-context learning (ICL) in large language models (LLMs) is Bayesian, using the martingale property as a key criterion. The martingale property, a fundamental requirement of Bayesian learning systems for exchangeable data, ensures that predictive distributions remain invariant to the order of data. The authors show that this property is necessary for unambiguous predictions and a principled notion of uncertainty in LLMs. They derive actionable diagnostics and test statistics to check if the martingale property holds, and find that state-of-the-art LLMs such as Llama2, Mistral, GPT-3.5, and GPT-4 violate this property in certain settings, indicating that ICL is not Bayesian. The study also examines how uncertainty in LLMs behaves with increasing data, finding deviations from Bayesian scaling. The results suggest that ICL in LLMs does not follow a principled Bayesian framework, and that the martingale property is crucial for ensuring reliable and interpretable predictions in safety-critical applications. The findings have important implications for the development of trustworthy AI systems.
Reach us at info@study.space