EVALUATING LANGUAGE MODEL AGENCY THROUGH NEGOTIATIONS

EVALUATING LANGUAGE MODEL AGENCY THROUGH NEGOTIATIONS

16 Mar 2024 | Tim R. Davidson, Veniamin Veselovsky, Martin Josifoski, Maxime Peyrard, Antoine Bosselut, Michal Kosinski, Robert West
The paper introduces a novel approach to evaluate language model (LM) agency using negotiation games, which better reflects real-world use cases and addresses the shortcomings of traditional LM benchmarks. Negotiation games allow for multi-turn and cross-model interactions, modulate complexity, and prevent accidental data leakage. The authors test six widely used and publicly accessible LMs, evaluating their performance and alignment in both self-play and cross-play settings. Key findings include: 1. Only closed-source models were able to complete the tasks. 2. Cooperative bargaining games proved the most challenging for the models. 3. Even the most powerful models sometimes "lose" to weaker opponents. The paper also discusses the limitations and ethical considerations of the approach, emphasizing the need for further research to safely integrate LMs into society. The authors release an open-source library and all data generated during the project to facilitate replication and extension of their findings.The paper introduces a novel approach to evaluate language model (LM) agency using negotiation games, which better reflects real-world use cases and addresses the shortcomings of traditional LM benchmarks. Negotiation games allow for multi-turn and cross-model interactions, modulate complexity, and prevent accidental data leakage. The authors test six widely used and publicly accessible LMs, evaluating their performance and alignment in both self-play and cross-play settings. Key findings include: 1. Only closed-source models were able to complete the tasks. 2. Cooperative bargaining games proved the most challenging for the models. 3. Even the most powerful models sometimes "lose" to weaker opponents. The paper also discusses the limitations and ethical considerations of the approach, emphasizing the need for further research to safely integrate LMs into society. The authors release an open-source library and all data generated during the project to facilitate replication and extension of their findings.
Reach us at info@study.space
[slides] Evaluating Language Model Agency through Negotiations | StudySpace