March 26, 2024 | Chanwoo Park*, Xiangyu Liu*, cspark97@mit.edu, xyliu99@umd.edu, Massachusetts Institute of Technology, University of Maryland, College Park, Asuman Ozdaglar, asuman@mit.edu, Kaiqing Zhang, kaiqing@umd.edu, Massachusetts Institute of Technology, University of Maryland, College Park
The paper investigates the performance of large language models (LLMs) in decision-making, particularly in multi-agent settings, through the lens of regret metrics. The authors study LLMs in online learning and game theory settings, focusing on their no-regret behaviors. They empirically examine the performance of several pre-trained LLMs, including GPT-4, in non-stationary online learning problems and repeated games. The study reveals that LLMs often exhibit no-regret behaviors, but also identifies cases where advanced LLMs like GPT-4 fail to be no-regret. To address this, the authors propose a novel unsupervised training loss called *regret-loss*, which does not require labels of optimal actions. The paper provides theoretical insights into the no-regret behaviors of LLMs under certain assumptions about supervised pre-training and human decision-makers' rationality. The regret-loss is shown to have statistical and optimization guarantees, and experiments demonstrate its effectiveness in addressing "regrettable" cases. The work contributes to a better understanding of LLMs' decision-making capabilities and suggests ways to enhance their no-regret properties.The paper investigates the performance of large language models (LLMs) in decision-making, particularly in multi-agent settings, through the lens of regret metrics. The authors study LLMs in online learning and game theory settings, focusing on their no-regret behaviors. They empirically examine the performance of several pre-trained LLMs, including GPT-4, in non-stationary online learning problems and repeated games. The study reveals that LLMs often exhibit no-regret behaviors, but also identifies cases where advanced LLMs like GPT-4 fail to be no-regret. To address this, the authors propose a novel unsupervised training loss called *regret-loss*, which does not require labels of optimal actions. The paper provides theoretical insights into the no-regret behaviors of LLMs under certain assumptions about supervised pre-training and human decision-makers' rationality. The regret-loss is shown to have statistical and optimization guarantees, and experiments demonstrate its effectiveness in addressing "regrettable" cases. The work contributes to a better understanding of LLMs' decision-making capabilities and suggests ways to enhance their no-regret properties.