9 Jul 2024 | Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Maarten Sap
This paper investigates how language models (LMs) communicate uncertainties in natural language and how users respond to these uncertainties. The study finds that LMs are reluctant to express uncertainties even when they are incorrect, and when prompted, they tend to be overconfident, leading to high error rates. Human experiments show that users heavily rely on LM-generated responses, whether or not they are marked by certainty. The study also finds that humans are biased against texts with uncertainty, which can lead to misinterpretation and overreliance on AI. The paper highlights the safety risks of LM overconfidence and proposes design recommendations and mitigating strategies to address these issues. The findings suggest that LMs should be designed to autonomously emit expressions of uncertainty without prompting, and that human annotators may have a bias against uncertainty in text. The study also discusses the importance of context-dependent calibration and the need for more diverse data sources to improve the generation and interpretation of epistemic markers. The paper concludes that LMs should be designed to verbalize uncertainty in ways that increase cognitive engagement and reduce human overreliance on AI.This paper investigates how language models (LMs) communicate uncertainties in natural language and how users respond to these uncertainties. The study finds that LMs are reluctant to express uncertainties even when they are incorrect, and when prompted, they tend to be overconfident, leading to high error rates. Human experiments show that users heavily rely on LM-generated responses, whether or not they are marked by certainty. The study also finds that humans are biased against texts with uncertainty, which can lead to misinterpretation and overreliance on AI. The paper highlights the safety risks of LM overconfidence and proposes design recommendations and mitigating strategies to address these issues. The findings suggest that LMs should be designed to autonomously emit expressions of uncertainty without prompting, and that human annotators may have a bias against uncertainty in text. The study also discusses the importance of context-dependent calibration and the need for more diverse data sources to improve the generation and interpretation of epistemic markers. The paper concludes that LMs should be designed to verbalize uncertainty in ways that increase cognitive engagement and reduce human overreliance on AI.