Is the System Message Really Important for Jailbreaks in Large Language Models?

Is the System Message Really Important for Jailbreaks in Large Language Models?

18 Jun 2024 | Anonymous ACL submission
The paper investigates the impact of system messages on the resistance of Large Language Models (LLMs) to "jailbreak" prompts, which are malicious inputs designed to elicit harmful responses from LLMs. The authors conduct experiments with mainstream LLMs (GPT3.5-turbo-0613, LLAMA2, VICUNA) using different system messages: short, long, and none. They find that varying system messages significantly affect the resistance to jailbreaks, with longer system messages generally showing better resistance. To enhance this resistance, the authors propose the System Messages Evolutionary Algorithm (SMEA), which uses evolutionary algorithms to generate more robust system messages. Through experiments, they demonstrate that SMEA can effectively reduce the attack success rate (ASR) of LLMs against jailbreak prompts, even with minor changes in the system messages. The study highlights the importance of system messages in enhancing LLM security and provides a novel approach to mitigate jailbreak threats.The paper investigates the impact of system messages on the resistance of Large Language Models (LLMs) to "jailbreak" prompts, which are malicious inputs designed to elicit harmful responses from LLMs. The authors conduct experiments with mainstream LLMs (GPT3.5-turbo-0613, LLAMA2, VICUNA) using different system messages: short, long, and none. They find that varying system messages significantly affect the resistance to jailbreaks, with longer system messages generally showing better resistance. To enhance this resistance, the authors propose the System Messages Evolutionary Algorithm (SMEA), which uses evolutionary algorithms to generate more robust system messages. Through experiments, they demonstrate that SMEA can effectively reduce the attack success rate (ASR) of LLMs against jailbreak prompts, even with minor changes in the system messages. The study highlights the importance of system messages in enhancing LLM security and provides a novel approach to mitigate jailbreak threats.
Reach us at info@study.space