9 May 2024 | Xikang Yang, Xuehai Tang, Songlin Hu, Jizhong Han
The paper introduces a novel method called CoA (Chain of Attack) for attacking large language models (LLMs) in multi-turn dialogues. CoA is a semantic-driven contextual multi-turn attack method that adaptively adjusts its attack policy through contextual feedback and semantic relevance during the dialogue. The method aims to expose vulnerabilities in LLMs by guiding them to produce unreasonable or harmful content. The authors evaluate CoA on various LLMs and datasets, demonstrating its effectiveness in triggering errors and biases in the models. The paper provides a new perspective and tool for both attacking and defending LLMs, contributing to the security and ethical assessment of dialogue systems. The code for CoA is available at: https://github.com/YancyKahn/CoA.The paper introduces a novel method called CoA (Chain of Attack) for attacking large language models (LLMs) in multi-turn dialogues. CoA is a semantic-driven contextual multi-turn attack method that adaptively adjusts its attack policy through contextual feedback and semantic relevance during the dialogue. The method aims to expose vulnerabilities in LLMs by guiding them to produce unreasonable or harmful content. The authors evaluate CoA on various LLMs and datasets, demonstrating its effectiveness in triggering errors and biases in the models. The paper provides a new perspective and tool for both attacking and defending LLMs, contributing to the security and ethical assessment of dialogue systems. The code for CoA is available at: https://github.com/YancyKahn/CoA.