GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning

GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning

29 May 2025 | Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, Dawn Song, Bo Li
The paper introduces GuardAgent, a novel framework designed to protect large language model (LLM) agents by dynamically checking their actions against given safety guard requests. GuardAgent leverages the reasoning capabilities of LLMs and in-context demonstrations from a memory module to generate and execute guardrail code, ensuring that target agents adhere to safety and privacy policies. The framework is characterized by its flexibility, reliability, and low operational overhead. Two novel benchmarks, EICU-AC and Mind2Web-SC, are proposed to evaluate access control for healthcare agents and safety control for web agents, respectively. Experiments demonstrate that GuardAgent effectively mitigates violation actions with high accuracies, achieving over 98% and 83% on these benchmarks, respectively, without affecting the task performance of the target agents. The paper also includes ablation studies to validate the effectiveness of different components of GuardAgent, such as memory retrieval and the toolbox of functions.The paper introduces GuardAgent, a novel framework designed to protect large language model (LLM) agents by dynamically checking their actions against given safety guard requests. GuardAgent leverages the reasoning capabilities of LLMs and in-context demonstrations from a memory module to generate and execute guardrail code, ensuring that target agents adhere to safety and privacy policies. The framework is characterized by its flexibility, reliability, and low operational overhead. Two novel benchmarks, EICU-AC and Mind2Web-SC, are proposed to evaluate access control for healthcare agents and safety control for web agents, respectively. Experiments demonstrate that GuardAgent effectively mitigates violation actions with high accuracies, achieving over 98% and 83% on these benchmarks, respectively, without affecting the task performance of the target agents. The paper also includes ablation studies to validate the effectiveness of different components of GuardAgent, such as memory retrieval and the toolbox of functions.
Reach us at info@study.space