LEVERAGING THE CONTEXT THROUGH MULTI-ROUND INTERACTIONS FOR JAILBREAKING ATTACKS

LEVERAGING THE CONTEXT THROUGH MULTI-ROUND INTERACTIONS FOR JAILBREAKING ATTACKS

2 Oct 2024 | Yixin Cheng, Markos Georgopoulos, Volkan Cevher, Grigoris G. Chryso
The paper "Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks" by Yixin Cheng, Markos Georgopoulos, Volkan Cevher, and Grigorios G. Chrysos explores a new form of attack on Large Language Models (LLMs) called the Contextual Interaction Attack (CIA). This attack leverages the context vector, which is the information preceding the attack query, to guide the model's responses and ultimately elicit harmful information. The authors propose a multi-turn interaction approach where benign preliminary questions are used to gradually align the context with the harmful intent, making it easier for the model to produce the desired response. The CIA is designed to be black-box and can transfer across different LLMs, demonstrating its effectiveness and transferability. The paper includes experimental results on various LLMs, showing that the CIA outperforms existing methods in terms of success rate and transferability. The authors also discuss the limitations and potential future directions, suggesting that the CIA can be further enhanced by combining it with other attack techniques.The paper "Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks" by Yixin Cheng, Markos Georgopoulos, Volkan Cevher, and Grigorios G. Chrysos explores a new form of attack on Large Language Models (LLMs) called the Contextual Interaction Attack (CIA). This attack leverages the context vector, which is the information preceding the attack query, to guide the model's responses and ultimately elicit harmful information. The authors propose a multi-turn interaction approach where benign preliminary questions are used to gradually align the context with the harmful intent, making it easier for the model to produce the desired response. The CIA is designed to be black-box and can transfer across different LLMs, demonstrating its effectiveness and transferability. The paper includes experimental results on various LLMs, showing that the CIA outperforms existing methods in terms of success rate and transferability. The authors also discuss the limitations and potential future directions, suggesting that the CIA can be further enhanced by combining it with other attack techniques.
Reach us at info@study.space