2 Jun 2024 | Fabian Falck * 1 Ziyu Wang * 1 Chris Holmes 1
This paper investigates whether in-context learning (ICL) in large language models (LLMs) is approximately Bayesian, focusing on the martingale property as a key characteristic of Bayesian learning systems. The martingale property ensures that the predictive distribution of an LLM is invariant to missing data from a population, which is crucial for unambiguous predictions and principled uncertainty estimation. The authors derive diagnostics to test the martingale property and find that state-of-the-art LLMs, such as Llama2, Mistral, GPT-3.5, and GPT-4, violate this property in certain settings. They provide evidence through synthetic experiments, showing that these models do not follow Bayesian principles and exhibit introspective hallucinations, where their predictions change when querying themselves. The findings have implications for the use of LLMs in exchangeable and safety-critical applications, highlighting the need for more robust uncertainty estimation. The paper concludes by discussing the limitations and potential future directions, including the need for models that better adhere to the martingale property.This paper investigates whether in-context learning (ICL) in large language models (LLMs) is approximately Bayesian, focusing on the martingale property as a key characteristic of Bayesian learning systems. The martingale property ensures that the predictive distribution of an LLM is invariant to missing data from a population, which is crucial for unambiguous predictions and principled uncertainty estimation. The authors derive diagnostics to test the martingale property and find that state-of-the-art LLMs, such as Llama2, Mistral, GPT-3.5, and GPT-4, violate this property in certain settings. They provide evidence through synthetic experiments, showing that these models do not follow Bayesian principles and exhibit introspective hallucinations, where their predictions change when querying themselves. The findings have implications for the use of LLMs in exchangeable and safety-critical applications, highlighting the need for more robust uncertainty estimation. The paper concludes by discussing the limitations and potential future directions, including the need for models that better adhere to the martingale property.