R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

18 Feb 2024 | Tongxin Yuan*, Zhiwei He*, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang†, Rui Wang, Gongshen Liu
This paper introduces R-Judge, a benchmark designed to evaluate the safety risk awareness of large language models (LLMs) in interactive environments. The benchmark consists of 162 multi-turn agent interaction records, covering 27 key risk scenarios across 7 application categories and 10 risk types. Each record includes a user instruction, agent actions, and environment feedback, with human-annotated safety labels and detailed risk descriptions. The evaluation of 9 LLMs on R-Judge reveals significant room for improvement in risk awareness, with GPT-4 achieving an F1 score of 72.52% compared to a human score of 89.07%. Further experiments show that leveraging risk descriptions as environment feedback significantly improves model performance. Case studies highlight that risk awareness in open agent scenarios is a multi-dimensional capability involving knowledge and reasoning, challenging for current LLMs. The paper concludes with insights into the development of risk-aware LLM agents, emphasizing the need for general capabilities and scenario-specific safety guidelines.This paper introduces R-Judge, a benchmark designed to evaluate the safety risk awareness of large language models (LLMs) in interactive environments. The benchmark consists of 162 multi-turn agent interaction records, covering 27 key risk scenarios across 7 application categories and 10 risk types. Each record includes a user instruction, agent actions, and environment feedback, with human-annotated safety labels and detailed risk descriptions. The evaluation of 9 LLMs on R-Judge reveals significant room for improvement in risk awareness, with GPT-4 achieving an F1 score of 72.52% compared to a human score of 89.07%. Further experiments show that leveraging risk descriptions as environment feedback significantly improves model performance. Case studies highlight that risk awareness in open agent scenarios is a multi-dimensional capability involving knowledge and reasoning, challenging for current LLMs. The paper concludes with insights into the development of risk-aware LLM agents, emphasizing the need for general capabilities and scenario-specific safety guidelines.
Reach us at info@study.space
[slides] R-Judge%3A Benchmarking Safety Risk Awareness for LLM Agents | StudySpace