GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning

GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning

29 May 2025 | Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, Dawn Song, Bo Li
GuardAgent is a novel framework designed to safeguard large language model (LLM) agents by dynamically checking whether their actions comply with given safety guard requests. It leverages knowledge-enabled reasoning to generate a task plan and convert it into executable guardrail code. GuardAgent uses an LLM as the reasoning component, supplemented by in-context demonstrations retrieved from a memory module. It can understand various safety guard requests and provide reliable, flexible, and low-overhead code-based guardrails. Two novel benchmarks, EICU-AC for healthcare agents and Mind2Web-SC for web agents, are introduced to evaluate GuardAgent's effectiveness. GuardAgent achieves over 98% and 83% guardrail accuracies on these benchmarks, respectively. The framework is non-invasive, allowing it to adapt to new agents and safety requests without affecting their performance. GuardAgent's design includes flexibility, reliability, and no additional training requirements. It is evaluated on various tasks, showing strong performance in safeguarding different types of agents. The results demonstrate that GuardAgent outperforms existing baselines in terms of accuracy and reliability, with no impact on the target agents' task performance. The framework is also adaptable to new safety guard requests and can be extended with additional functions. The study highlights the importance of code-based guardrails for effective and flexible safety control in LLM agents.GuardAgent is a novel framework designed to safeguard large language model (LLM) agents by dynamically checking whether their actions comply with given safety guard requests. It leverages knowledge-enabled reasoning to generate a task plan and convert it into executable guardrail code. GuardAgent uses an LLM as the reasoning component, supplemented by in-context demonstrations retrieved from a memory module. It can understand various safety guard requests and provide reliable, flexible, and low-overhead code-based guardrails. Two novel benchmarks, EICU-AC for healthcare agents and Mind2Web-SC for web agents, are introduced to evaluate GuardAgent's effectiveness. GuardAgent achieves over 98% and 83% guardrail accuracies on these benchmarks, respectively. The framework is non-invasive, allowing it to adapt to new agents and safety requests without affecting their performance. GuardAgent's design includes flexibility, reliability, and no additional training requirements. It is evaluated on various tasks, showing strong performance in safeguarding different types of agents. The results demonstrate that GuardAgent outperforms existing baselines in terms of accuracy and reliability, with no impact on the target agents' task performance. The framework is also adaptable to new safety guard requests and can be extended with additional functions. The study highlights the importance of code-based guardrails for effective and flexible safety control in LLM agents.
Reach us at info@futurestudyspace.com