30 Jul 2024 | Ryan Liu*, Jiayi Geng*, Joshua C. Peterson, Ilia Sucholutsky, Thomas L. Griffiths
The paper investigates how Large Language Models (LLMs) model human decision-making, particularly in the context of rationality. It finds that while LLMs are trained to align with human preferences, their implicit models of human behavior often assume that people are more rational than they actually are. This is evident in two main experiments: forward modeling, where LLMs predict human choices, and inverse modeling, where LLMs infer human preferences from choices.
In the forward modeling experiment, LLMs, even with chain-of-thought prompting, consistently predict that humans act more rationally than they do. For example, LLMs with zero-shot prompting have a correlation of 0.60 with human choices, while a rational model of maximizing expected value has a correlation of 0.48. With chain-of-thought prompting, LLMs achieve correlations of 0.93 and 0.94 with the rational model, respectively.
In the inverse modeling experiment, LLMs' inferences of human preferences from choices are highly correlated with both absolute and relative utility models, which achieve Spearman correlations of 0.98 with human inferences. Additionally, LLMs' inferences are also highly correlated with human behavior, achieving a correlation of 0.97 with human rankings.
The paper discusses the implications of these findings for aligning LLMs with human expectations versus human behavior, and suggests that different training paradigms may be needed for each. It also highlights the potential for cognitive science to help understand and improve LLM alignment.The paper investigates how Large Language Models (LLMs) model human decision-making, particularly in the context of rationality. It finds that while LLMs are trained to align with human preferences, their implicit models of human behavior often assume that people are more rational than they actually are. This is evident in two main experiments: forward modeling, where LLMs predict human choices, and inverse modeling, where LLMs infer human preferences from choices.
In the forward modeling experiment, LLMs, even with chain-of-thought prompting, consistently predict that humans act more rationally than they do. For example, LLMs with zero-shot prompting have a correlation of 0.60 with human choices, while a rational model of maximizing expected value has a correlation of 0.48. With chain-of-thought prompting, LLMs achieve correlations of 0.93 and 0.94 with the rational model, respectively.
In the inverse modeling experiment, LLMs' inferences of human preferences from choices are highly correlated with both absolute and relative utility models, which achieve Spearman correlations of 0.98 with human inferences. Additionally, LLMs' inferences are also highly correlated with human behavior, achieving a correlation of 0.97 with human rankings.
The paper discusses the implications of these findings for aligning LLMs with human expectations versus human behavior, and suggests that different training paradigms may be needed for each. It also highlights the potential for cognitive science to help understand and improve LLM alignment.