CYBERSEC EVAL 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

CYBERSEC EVAL 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

19 Apr 2024 | Manish Bhatt*, Sahana Chennabasappa*, Yue Li*, Cyrus Nikolaidis*, Daniel Song*, Shengye Wan*, Faizan Ahmad, Cornelius Aschermann, Yaohui Chen, Dhaval Kapil, David Molnar, Spencer Whitman, Joshua Saxe*
CYBERSEC EVAL 2 is a comprehensive benchmark suite designed to quantify and mitigate cybersecurity risks associated with large language models (LLMs). The suite introduces two new testing areas: prompt injection and code interpreter abuse, expanding the previous evaluation to four categories. The authors evaluated multiple state-of-the-art LLMs, including GPT-4, Mistral, Meta Llama 3 70B-Instruct, and Code Llama, finding that all models showed between 26% and 41% successful prompt injection tests, indicating that conditioning LLMs to reduce attack risks remains challenging. They also introduced the concept of the *safety-utility tradeoff*, where conditioning LLMs to reject unsafe prompts can lead to false rejections of benign prompts, lowering utility. The False Refusal Rate (FRR) is proposed as a measure to quantify this tradeoff. Additionally, the suite evaluates LLMs' ability to exploit software vulnerabilities, finding that models with coding capabilities perform better but more research is needed for LLMs to become proficient at exploit generation. The code for CYBERSEC EVAL 2 is open-source and can be used to evaluate other LLMs. The paper provides detailed descriptions of the new tests and case studies using popular LLMs, highlighting the need for additional guardrails and further research to ensure the safe deployment of LLMs.CYBERSEC EVAL 2 is a comprehensive benchmark suite designed to quantify and mitigate cybersecurity risks associated with large language models (LLMs). The suite introduces two new testing areas: prompt injection and code interpreter abuse, expanding the previous evaluation to four categories. The authors evaluated multiple state-of-the-art LLMs, including GPT-4, Mistral, Meta Llama 3 70B-Instruct, and Code Llama, finding that all models showed between 26% and 41% successful prompt injection tests, indicating that conditioning LLMs to reduce attack risks remains challenging. They also introduced the concept of the *safety-utility tradeoff*, where conditioning LLMs to reject unsafe prompts can lead to false rejections of benign prompts, lowering utility. The False Refusal Rate (FRR) is proposed as a measure to quantify this tradeoff. Additionally, the suite evaluates LLMs' ability to exploit software vulnerabilities, finding that models with coding capabilities perform better but more research is needed for LLMs to become proficient at exploit generation. The code for CYBERSEC EVAL 2 is open-source and can be used to evaluate other LLMs. The paper provides detailed descriptions of the new tests and case studies using popular LLMs, highlighting the need for additional guardrails and further research to ensure the safe deployment of LLMs.
Reach us at info@study.space