Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning

Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning

16 Feb 2024 | Shuai Zhao1 2, Meihuizi Jia 3 2, Luu Anh Tuan 2, Fengjun Pan 2, Jinming Wen1 4*
This paper explores the security vulnerabilities of large language models (LLMs) in the context of in-context learning (ICL), a paradigm that bridges pre-training and fine-tuning. The authors introduce a novel backdoor attack method called ICLAttack, which targets LLMs based on ICL without requiring fine-tuning. ICLAttack involves two types of attacks: poisoning demonstration examples and poisoning demonstration prompts. The method ensures that the labels of the poisoned examples remain correctly annotated, enhancing the stealth of the attack. Extensive experiments across various language models, ranging from 1.3B to 180B parameters, demonstrate the effectiveness of ICLAttack, achieving an average attack success rate of 95.0% on three datasets. The paper highlights the universal vulnerabilities of LLMs in ICL and calls for increased vigilance and research into defenses against such attacks.This paper explores the security vulnerabilities of large language models (LLMs) in the context of in-context learning (ICL), a paradigm that bridges pre-training and fine-tuning. The authors introduce a novel backdoor attack method called ICLAttack, which targets LLMs based on ICL without requiring fine-tuning. ICLAttack involves two types of attacks: poisoning demonstration examples and poisoning demonstration prompts. The method ensures that the labels of the poisoned examples remain correctly annotated, enhancing the stealth of the attack. Extensive experiments across various language models, ranging from 1.3B to 180B parameters, demonstrate the effectiveness of ICLAttack, achieving an average attack success rate of 95.0% on three datasets. The paper highlights the universal vulnerabilities of LLMs in ICL and calls for increased vigilance and research into defenses against such attacks.
Reach us at info@study.space