Understanding Shadowcast%3A Stealthy Data Poisoning Attacks Against Vision-Language Models

**Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models** This paper introduces Shadowcast, a novel data poisoning attack method designed to manipulate Vision-Language Models (VLMs) into generating misleading responses to benign inputs. VLMs, which integrate visual and textual capabilities, are vulnerable to data poisoning attacks, where attackers tamper with training data to influence model behavior. Shadowcast is unique in that it creates visually indistinguishable poison samples from benign images and their corresponding texts, making them difficult to detect. **Key Contributions:** 1. **First-Premier Study:** Shadowcast is the first study to demonstrate the practicality of data poisoning attacks on VLMs, which can manipulate responses to everyday prompts. 2. **Two Attack Types:** The attack includes Label Attack, where VLMs misidentify class labels, and Persuasion Attack, where VLMs generate persuasive narratives to mislead users. 3. **High Effectiveness:** Shadowcast achieves significant results with as few as 50 poison samples, demonstrating its efficiency. 4. **Transferability:** The attack remains effective across different VLM architectures and prompts, even in a black-box setting. 5. **Robustness:** Shadowcast is resilient against data augmentation and JPEG compression techniques. **Methods:** - **Threat Model:** The attacker aims to manipulate VLMs to misinterpret images from one concept as another, using either class labels or persuasive narratives. - **Attacker Capabilities:** The attacker can inject poison data, access images and texts, and craft visually indistinguishable poison samples. - **Model Training:** The attack is evaluated on VLMs trained using visual instruction tuning, with a focus on the LLaVA-1.5 model. **Experiments:** - **Label Attack:** Shadowcast achieves high success rates with a low number of poison samples, demonstrating its effectiveness. - **Persuasion Attack:** The attack successfully manipulates responses to generate coherent and persuasive narratives. - **Generalizability:** Shadowcast maintains its effectiveness across diverse prompts and different VLM architectures. - **Robustness:** The attack remains potent despite data augmentation and JPEG compression. **Conclusion:** Shadowcast highlights the critical risks of data poisoning attacks on VLMs, emphasizing the need for high-quality training data and robust defense strategies. The study underscores the importance of vigilant data examination practices and the development of effective defenses to ensure the secure deployment of VLMs.**Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models** This paper introduces Shadowcast, a novel data poisoning attack method designed to manipulate Vision-Language Models (VLMs) into generating misleading responses to benign inputs. VLMs, which integrate visual and textual capabilities, are vulnerable to data poisoning attacks, where attackers tamper with training data to influence model behavior. Shadowcast is unique in that it creates visually indistinguishable poison samples from benign images and their corresponding texts, making them difficult to detect. **Key Contributions:** 1. **First-Premier Study:** Shadowcast is the first study to demonstrate the practicality of data poisoning attacks on VLMs, which can manipulate responses to everyday prompts. 2. **Two Attack Types:** The attack includes Label Attack, where VLMs misidentify class labels, and Persuasion Attack, where VLMs generate persuasive narratives to mislead users. 3. **High Effectiveness:** Shadowcast achieves significant results with as few as 50 poison samples, demonstrating its efficiency. 4. **Transferability:** The attack remains effective across different VLM architectures and prompts, even in a black-box setting. 5. **Robustness:** Shadowcast is resilient against data augmentation and JPEG compression techniques. **Methods:** - **Threat Model:** The attacker aims to manipulate VLMs to misinterpret images from one concept as another, using either class labels or persuasive narratives. - **Attacker Capabilities:** The attacker can inject poison data, access images and texts, and craft visually indistinguishable poison samples. - **Model Training:** The attack is evaluated on VLMs trained using visual instruction tuning, with a focus on the LLaVA-1.5 model. **Experiments:** - **Label Attack:** Shadowcast achieves high success rates with a low number of poison samples, demonstrating its effectiveness. - **Persuasion Attack:** The attack successfully manipulates responses to generate coherent and persuasive narratives. - **Generalizability:** Shadowcast maintains its effectiveness across diverse prompts and different VLM architectures. - **Robustness:** The attack remains potent despite data augmentation and JPEG compression. **Conclusion:** Shadowcast highlights the critical risks of data poisoning attacks on VLMs, emphasizing the need for high-quality training data and robust defense strategies. The study underscores the importance of vigilant data examination practices and the development of effective defenses to ensure the secure deployment of VLMs.

Shadowcast: Stealthy Data Poisoning Attacks against Vision-Language Models

5 Feb 2024 | Yuancheng Xu, Jiarui Yao, Manli Shu, Yanchao Sun, Zichu Wu, Ning Yu, Tom Goldstein, Furong Huang