29 Feb 2024 | Hongyi Liu, Zirui Liu, Ruixiang Tang, Jiayi Yuan, Shaochen Zhong, Yu-Neng Chuang, Li Li†, Rui Chen†, Xia Hu
The paper "LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario" explores the security risks associated with the use of Low-Rank Adaptation (LoRA) in large language models (LLMs). LoRA is a popular and efficient method for fine-tuning LLMs, allowing users to customize models for specific tasks. However, this share-and-play nature of LoRA opens up new attack surfaces, such as backdoor injection, where attackers can embed malicious behavior into LoRA modules and distribute them widely. The study investigates how to inject backdoors into LoRA modules and examines the infection mechanisms, finding that training-free methods are possible. It also explores the impact of multiple LoRAs and the transferability of adversarial LoRAs. The research aims to raise awareness of these security risks and provide proactive defense strategies to prevent potential consequences. Key findings include the effectiveness of stealthy backdoor injection, the ability to maintain downstream functionality while embedding backdoors, and the potential for cross-model transferability of LoRA-based attacks. The study concludes by emphasizing the need for addressing these vulnerabilities to mitigate the risks posed by LoRA-as-an-Attack.The paper "LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario" explores the security risks associated with the use of Low-Rank Adaptation (LoRA) in large language models (LLMs). LoRA is a popular and efficient method for fine-tuning LLMs, allowing users to customize models for specific tasks. However, this share-and-play nature of LoRA opens up new attack surfaces, such as backdoor injection, where attackers can embed malicious behavior into LoRA modules and distribute them widely. The study investigates how to inject backdoors into LoRA modules and examines the infection mechanisms, finding that training-free methods are possible. It also explores the impact of multiple LoRAs and the transferability of adversarial LoRAs. The research aims to raise awareness of these security risks and provide proactive defense strategies to prevent potential consequences. Key findings include the effectiveness of stealthy backdoor injection, the ability to maintain downstream functionality while embedding backdoors, and the potential for cross-model transferability of LoRA-based attacks. The study concludes by emphasizing the need for addressing these vulnerabilities to mitigate the risks posed by LoRA-as-an-Attack.