[slides and audio] Evaluating Language Model Agency through Negotiations

This paper introduces a method to evaluate the agency of language models (LMs) through negotiation games. The approach allows for multi-turn and cross-model interactions, modulates complexity, and avoids accidental data leakage. Six widely used LMs were tested in both self-play and cross-play settings. Key findings include: only closed-source models completed the tasks; cooperative bargaining was most challenging; and even the most powerful models sometimes lost to weaker opponents. The study highlights the need for dynamic, co-evolving benchmarks that assess both performance and alignment. Negotiation games, built from simple segments, can be made complex and are well-suited for analyzing alignment. The framework includes an open-source library and data ("LAMEN transcripts") for replication. The paper defines structured negotiation games, distinguishing between distributive and compatible issues, integrative and non-integrative games. It also discusses task complexity, co-evolving benchmarks, and state-of-mind consistency. The experimental setup evaluated several models, including OpenAI's gpt-3.5 and gpt-4, Google's chat-bison, Anthropic's claude-2, Cohere's command and command-light, and Meta's LLaMA 2. Results showed that gpt-4 had superior faithfulness and instruction-following metrics but lower agreement rates. The study emphasizes the importance of evaluating LMs in realistic, multi-agent scenarios to ensure safety and alignment. The paper also discusses limitations, including high costs and ethical considerations. The research contributes to the field of language model evaluation by proposing negotiation games as a promising alternative to static benchmarks.This paper introduces a method to evaluate the agency of language models (LMs) through negotiation games. The approach allows for multi-turn and cross-model interactions, modulates complexity, and avoids accidental data leakage. Six widely used LMs were tested in both self-play and cross-play settings. Key findings include: only closed-source models completed the tasks; cooperative bargaining was most challenging; and even the most powerful models sometimes lost to weaker opponents. The study highlights the need for dynamic, co-evolving benchmarks that assess both performance and alignment. Negotiation games, built from simple segments, can be made complex and are well-suited for analyzing alignment. The framework includes an open-source library and data ("LAMEN transcripts") for replication. The paper defines structured negotiation games, distinguishing between distributive and compatible issues, integrative and non-integrative games. It also discusses task complexity, co-evolving benchmarks, and state-of-mind consistency. The experimental setup evaluated several models, including OpenAI's gpt-3.5 and gpt-4, Google's chat-bison, Anthropic's claude-2, Cohere's command and command-light, and Meta's LLaMA 2. Results showed that gpt-4 had superior faithfulness and instruction-following metrics but lower agreement rates. The study emphasizes the importance of evaluating LMs in realistic, multi-agent scenarios to ensure safety and alignment. The paper also discusses limitations, including high costs and ethical considerations. The research contributes to the field of language model evaluation by proposing negotiation games as a promising alternative to static benchmarks.

Evaluating Language Model Agency through Negotiations

2024 | Tim R. Davidson, Veniamin Veselovsky, Martin Josifoski, Maxime Peyrard, Antoine Bosselut, Michal Kosinski, Robert West