Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM

Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM

2024 | Xikang Yang, Xuehai Tang, Songlin Hu, Jizhong Han
This paper introduces CoA (Chain of Attack), a semantic-driven contextual multi-turn attack method for large language models (LLMs). CoA dynamically adjusts attack policies based on contextual feedback and semantic relevance during multi-turn dialogues, leading LLMs to generate harmful or biased content. The method is evaluated on various LLMs and datasets, demonstrating its effectiveness in exposing vulnerabilities and outperforming existing attack methods. CoA contributes to the security and ethical assessment of dialogue systems by providing a new perspective and tool for attacking and defending LLMs. The paper outlines the problem setup, where the goal is to design an attack method that can effectively attack LLMs in multi-turn dialogues to reveal their security and ethical risks. The method involves three steps: Seed Attack Chain Generator, Attack Chain Executor, and Attack Chain Updating. The Seed Attack Chain Generator creates attack chains based on the target task, while the Attack Chain Executor systematically inputs attack prompts into the target model and evaluates the responses. The Attack Chain Updater adjusts the attack strategy based on semantic relevance and context. The methodology includes policy selection based on incremental semantic relevance and context-driven attackers. The attack prompts are refined based on the target model's responses, and the attacker model dynamically adjusts the attack policy to align with the attack objectives. The paper presents experiments on various datasets and language models, showing that CoA achieves high attack success rates and effectively induces LLMs to produce harmful content. The results demonstrate that CoA outperforms existing attack methods in terms of attack success rates and effectiveness in exposing vulnerabilities in LLMs. The method is applicable across different models and datasets, highlighting its adaptability and robustness. The paper concludes by discussing future work, including exploring defense mechanisms against attacks in multi-turn conversations and analyzing attacks from an intrinsic security perspective.This paper introduces CoA (Chain of Attack), a semantic-driven contextual multi-turn attack method for large language models (LLMs). CoA dynamically adjusts attack policies based on contextual feedback and semantic relevance during multi-turn dialogues, leading LLMs to generate harmful or biased content. The method is evaluated on various LLMs and datasets, demonstrating its effectiveness in exposing vulnerabilities and outperforming existing attack methods. CoA contributes to the security and ethical assessment of dialogue systems by providing a new perspective and tool for attacking and defending LLMs. The paper outlines the problem setup, where the goal is to design an attack method that can effectively attack LLMs in multi-turn dialogues to reveal their security and ethical risks. The method involves three steps: Seed Attack Chain Generator, Attack Chain Executor, and Attack Chain Updating. The Seed Attack Chain Generator creates attack chains based on the target task, while the Attack Chain Executor systematically inputs attack prompts into the target model and evaluates the responses. The Attack Chain Updater adjusts the attack strategy based on semantic relevance and context. The methodology includes policy selection based on incremental semantic relevance and context-driven attackers. The attack prompts are refined based on the target model's responses, and the attacker model dynamically adjusts the attack policy to align with the attack objectives. The paper presents experiments on various datasets and language models, showing that CoA achieves high attack success rates and effectively induces LLMs to produce harmful content. The results demonstrate that CoA outperforms existing attack methods in terms of attack success rates and effectiveness in exposing vulnerabilities in LLMs. The method is applicable across different models and datasets, highlighting its adaptability and robustness. The paper concludes by discussing future work, including exploring defense mechanisms against attacks in multi-turn conversations and analyzing attacks from an intrinsic security perspective.
Reach us at info@study.space
[slides and audio] Chain of Attack%3A a Semantic-Driven Contextual Multi-Turn attacker for LLM