Test-Time Backdoor Attacks on Multimodal Large Language Models

Test-Time Backdoor Attacks on Multimodal Large Language Models

2402.08577v1 13 Feb 2024 | Dong Lu, Tianyu Pang, Chao Du, Qian Liu, Xianjun Yang, Min Lin
This paper introduces AnyDoor, a test-time backdoor attack against multimodal large language models (MLLMs). Unlike traditional backdoor attacks that require access to training data, AnyDoor injects a backdoor into the textual modality using adversarial test images with a universal perturbation, without modifying training data. It leverages techniques from universal adversarial attacks but distinguishes itself by decoupling the setup and activation of harmful effects. The attack can dynamically change its trigger prompts or harmful effects, posing new challenges for defense. AnyDoor is validated against popular MLLMs such as LLaVA-1.5, MiniGPT-4, InstructBLIP, and BLIP-2. Comprehensive ablation studies show that AnyDoor is effective across various datasets, perturbation budgets, and trigger prompts. The attack is robust against common corruptions and can be applied to dynamic video scenarios. It also demonstrates effectiveness against a wide range of trigger-target combinations, including natural and synthetic data. The attack works by introducing a universal adversarial perturbation to input images, which is then used to set up a backdoor in the textual modality. When a trigger is present, the MLLM responds with harmful outputs. The attack can be applied to different modalities, with visual input suitable for setup and textual input for activation. The attack is robust to changes in trigger case and can be effective even with random trigger placement. The paper highlights the inherent vulnerabilities of MLLMs to well-crafted adversarial perturbations and underscores the need for improved defense mechanisms. AnyDoor demonstrates the potential for test-time backdoor attacks to exploit the multimodal capabilities of MLLMs, posing significant security risks. The results show that even advanced MLLMs are vulnerable to such attacks, emphasizing the importance of developing robust defenses against backdoor attacks.This paper introduces AnyDoor, a test-time backdoor attack against multimodal large language models (MLLMs). Unlike traditional backdoor attacks that require access to training data, AnyDoor injects a backdoor into the textual modality using adversarial test images with a universal perturbation, without modifying training data. It leverages techniques from universal adversarial attacks but distinguishes itself by decoupling the setup and activation of harmful effects. The attack can dynamically change its trigger prompts or harmful effects, posing new challenges for defense. AnyDoor is validated against popular MLLMs such as LLaVA-1.5, MiniGPT-4, InstructBLIP, and BLIP-2. Comprehensive ablation studies show that AnyDoor is effective across various datasets, perturbation budgets, and trigger prompts. The attack is robust against common corruptions and can be applied to dynamic video scenarios. It also demonstrates effectiveness against a wide range of trigger-target combinations, including natural and synthetic data. The attack works by introducing a universal adversarial perturbation to input images, which is then used to set up a backdoor in the textual modality. When a trigger is present, the MLLM responds with harmful outputs. The attack can be applied to different modalities, with visual input suitable for setup and textual input for activation. The attack is robust to changes in trigger case and can be effective even with random trigger placement. The paper highlights the inherent vulnerabilities of MLLMs to well-crafted adversarial perturbations and underscores the need for improved defense mechanisms. AnyDoor demonstrates the potential for test-time backdoor attacks to exploit the multimodal capabilities of MLLMs, posing significant security risks. The results show that even advanced MLLMs are vulnerable to such attacks, emphasizing the importance of developing robust defenses against backdoor attacks.
Reach us at info@study.space