Is the System Message Really Important for Jailbreaks in Large Language Models?

Is the System Message Really Important for Jailbreaks in Large Language Models?

2024 | Anonymous
This paper investigates the role of system messages in the security of large language models (LLMs) and their impact on jailbreaks. The study addresses the question: Is the system message really important for jailbreaks in LLMs? Through experiments on mainstream LLMs, the research finds that different system messages significantly affect the effectiveness of jailbreak prompts. The study also proposes the System Messages Evolutionary Algorithm (SMEA) to generate system messages that are more resistant to jailbreaks. SMEA is designed to evolve system messages through evolutionary algorithms, enabling the creation of robust and diverse system messages that enhance LLM security. The experiments show that system messages play a crucial role in determining the success rate of jailbreak attempts. For example, LLMs with different system messages exhibit varying levels of resistance to the same jailbreak prompts. The study also demonstrates that minor changes in system messages can significantly alter the success rate of jailbreaks. Furthermore, SMEA is shown to be effective in generating system messages that are more resistant to jailbreak prompts, with results indicating that it can increase LLM resistance to jailbreaks above 60% for most models. The research contributes to the field of LLM security by highlighting the importance of system messages in preventing jailbreaks and by proposing a novel method for generating robust system messages. The study also identifies limitations of the proposed method, including the potential for local optima in the optimization process and the impact of limited experimental resources. Overall, the findings underscore the significance of system messages in LLM security and provide a framework for improving the resilience of LLMs against jailbreak attempts.This paper investigates the role of system messages in the security of large language models (LLMs) and their impact on jailbreaks. The study addresses the question: Is the system message really important for jailbreaks in LLMs? Through experiments on mainstream LLMs, the research finds that different system messages significantly affect the effectiveness of jailbreak prompts. The study also proposes the System Messages Evolutionary Algorithm (SMEA) to generate system messages that are more resistant to jailbreaks. SMEA is designed to evolve system messages through evolutionary algorithms, enabling the creation of robust and diverse system messages that enhance LLM security. The experiments show that system messages play a crucial role in determining the success rate of jailbreak attempts. For example, LLMs with different system messages exhibit varying levels of resistance to the same jailbreak prompts. The study also demonstrates that minor changes in system messages can significantly alter the success rate of jailbreaks. Furthermore, SMEA is shown to be effective in generating system messages that are more resistant to jailbreak prompts, with results indicating that it can increase LLM resistance to jailbreaks above 60% for most models. The research contributes to the field of LLM security by highlighting the importance of system messages in preventing jailbreaks and by proposing a novel method for generating robust system messages. The study also identifies limitations of the proposed method, including the potential for local optima in the optimization process and the impact of limited experimental resources. Overall, the findings underscore the significance of system messages in LLM security and provide a framework for improving the resilience of LLMs against jailbreak attempts.
Reach us at info@study.space