The Calibration Gap between Model and Human Confidence in Large Language Models

The Calibration Gap between Model and Human Confidence in Large Language Models

24 Jan 2024 | Mark Steyvers, Heliodoro Tejeda, Aakriti Kumar, Catarina Belem, Sheer Karny, Xinyue Hu, Lukas Mayer, Padhraic Smyth
This paper investigates the calibration gap between model confidence and human confidence in large language models (LLMs). The study explores how well LLMs can communicate their internal confidence to human users, and how human users perceive the reliability of LLM outputs. Through experiments with multiple-choice questions, the research examines human perception of LLM confidence and the impact of tailored explanations on this perception. The findings show that default explanations from LLMs often lead to user overestimation of both the model's confidence and its accuracy. By modifying explanations to better reflect the LLM's internal confidence, user perception aligns more closely with the model's actual confidence levels. This adjustment in explanatory approach demonstrates potential for enhancing user trust and accuracy in assessing LLM outputs. The study highlights the importance of transparent communication of confidence levels in LLMs, particularly in high-stakes applications where understanding the reliability of AI-generated information is essential. The research also shows that human confidence in LLM outputs is significantly affected by the type of explanation provided, with modified explanations leading to improved calibration and discrimination performance. The study used two state-of-the-art LLMs, GPT-3.5 and PaLM2, and a subset of the MMLU dataset to conduct behavioral experiments. The results indicate that participants' accuracy in answering questions was generally low, and that self-assessed expertise did not significantly affect performance. The study concludes that clear and accurate communication is critical in the interaction between users and LLMs, and that enhancing the alignment between model confidence and user perception can lead to a more responsible and trustworthy use of LLMs.This paper investigates the calibration gap between model confidence and human confidence in large language models (LLMs). The study explores how well LLMs can communicate their internal confidence to human users, and how human users perceive the reliability of LLM outputs. Through experiments with multiple-choice questions, the research examines human perception of LLM confidence and the impact of tailored explanations on this perception. The findings show that default explanations from LLMs often lead to user overestimation of both the model's confidence and its accuracy. By modifying explanations to better reflect the LLM's internal confidence, user perception aligns more closely with the model's actual confidence levels. This adjustment in explanatory approach demonstrates potential for enhancing user trust and accuracy in assessing LLM outputs. The study highlights the importance of transparent communication of confidence levels in LLMs, particularly in high-stakes applications where understanding the reliability of AI-generated information is essential. The research also shows that human confidence in LLM outputs is significantly affected by the type of explanation provided, with modified explanations leading to improved calibration and discrimination performance. The study used two state-of-the-art LLMs, GPT-3.5 and PaLM2, and a subset of the MMLU dataset to conduct behavioral experiments. The results indicate that participants' accuracy in answering questions was generally low, and that self-assessed expertise did not significantly affect performance. The study concludes that clear and accurate communication is critical in the interaction between users and LLMs, and that enhancing the alignment between model confidence and user perception can lead to a more responsible and trustworthy use of LLMs.
Reach us at info@study.space
[slides and audio] What large language models know and what people think they know