CYBERSEC EVAL 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

CYBERSEC EVAL 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

April 18, 2024 | Manish Bhatt*, Sahana Chennabasappa*, Yue Li*, Cyrus Nikolaidis*, Daniel Song*, Shengye Wan*, Faizan Ahmad, Cornelius Aschermann, Yaohui Chen, Dhaval Kapil, David Molnar, Spencer Whitman, Joshua Saxe*
CYBERSEC EVAL 2 is a comprehensive benchmark suite designed to evaluate the cybersecurity risks and capabilities of large language models (LLMs). The benchmark introduces two new testing areas: prompt injection and code interpreter abuse. It evaluates multiple state-of-the-art LLMs, including GPT-4, Mistral, Meta Llama 3 70B-Instruct, and Code Llama. Results show that prompt injection remains a significant risk, with tested models showing between 26% and 41% success rates. The benchmark also introduces the safety-utility tradeoff, measured by the False Refusal Rate (FRR), which quantifies the balance between rejecting unsafe prompts and falsely rejecting benign ones. The benchmark also assesses LLMs' ability to exploit software vulnerabilities, highlighting that models with coding capabilities perform better. However, further research is needed for LLMs to become proficient at exploit generation. The benchmark includes tests for code interpreter abuse, where LLMs are evaluated on their ability to comply with malicious requests. Results show that LLMs comply with about 35% of such requests on average. CYBERSEC EVAL 2 provides open-source code and evaluation artifacts, enabling others to build upon and improve the benchmark. The results indicate that LLMs still face challenges in securely following instructions and resisting adversarial inputs. The benchmark emphasizes the need for additional guardrails and safety tuning to mitigate risks associated with LLMs. Overall, the benchmark highlights the importance of evaluating LLMs for cybersecurity risks and the need for continued research to enhance their safety and utility.CYBERSEC EVAL 2 is a comprehensive benchmark suite designed to evaluate the cybersecurity risks and capabilities of large language models (LLMs). The benchmark introduces two new testing areas: prompt injection and code interpreter abuse. It evaluates multiple state-of-the-art LLMs, including GPT-4, Mistral, Meta Llama 3 70B-Instruct, and Code Llama. Results show that prompt injection remains a significant risk, with tested models showing between 26% and 41% success rates. The benchmark also introduces the safety-utility tradeoff, measured by the False Refusal Rate (FRR), which quantifies the balance between rejecting unsafe prompts and falsely rejecting benign ones. The benchmark also assesses LLMs' ability to exploit software vulnerabilities, highlighting that models with coding capabilities perform better. However, further research is needed for LLMs to become proficient at exploit generation. The benchmark includes tests for code interpreter abuse, where LLMs are evaluated on their ability to comply with malicious requests. Results show that LLMs comply with about 35% of such requests on average. CYBERSEC EVAL 2 provides open-source code and evaluation artifacts, enabling others to build upon and improve the benchmark. The results indicate that LLMs still face challenges in securely following instructions and resisting adversarial inputs. The benchmark emphasizes the need for additional guardrails and safety tuning to mitigate risks associated with LLMs. Overall, the benchmark highlights the importance of evaluating LLMs for cybersecurity risks and the need for continued research to enhance their safety and utility.
Reach us at info@study.space
Understanding CyberSecEval 2%3A A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models