Shadowcast: Stealthy Data Poisoning Attacks against Vision-Language Models

Shadowcast: Stealthy Data Poisoning Attacks against Vision-Language Models

5 Feb 2024 | Yuancheng Xu, Jiarui Yao, Manli Shu, Yanchao Sun, Zichu Wu, Ning Yu, Tom Goldstein, Furong Huang
This paper introduces Shadowcast, a stealthy data poisoning attack against Vision-Language Models (VLMs). The attack manipulates VLMs to misinterpret images from one concept as belonging to a different concept, either by misidentifying class labels (Label Attack) or by generating misleading narratives (Persuasion Attack). Shadowcast uses visually indistinguishable poison samples that are crafted from clean images and their corresponding text descriptions. These samples are designed to be imperceptible to humans but effective in altering VLM responses. The attack is effective with as few as 50 poison samples and remains effective across various prompts and VLM architectures. Shadowcast is also robust against data augmentation and image compression techniques. The study demonstrates that poisoned VLMs can generate convincing yet deceptive misinformation, highlighting the importance of data quality for responsible deployment of VLMs. The paper evaluates Shadowcast on two benchmarks, GQA and VizWiz, showing that poisoned models maintain similar performance to clean models. The attack is successful in both Label Attack and Persuasion Attack scenarios, with high success rates even when using a small number of poison samples. Human evaluation confirms that poisoned models generate coherent and persuasive responses, effectively altering user perceptions. The study also shows that Shadowcast is transferable across different VLM architectures and remains effective in black-box settings. The results underscore the significant risks of data poisoning against VLMs and the need for robust data cleaning and defensive strategies. The paper concludes that data poisoning attacks on VLMs are a critical security concern, and further research is needed to develop effective defenses against such attacks.This paper introduces Shadowcast, a stealthy data poisoning attack against Vision-Language Models (VLMs). The attack manipulates VLMs to misinterpret images from one concept as belonging to a different concept, either by misidentifying class labels (Label Attack) or by generating misleading narratives (Persuasion Attack). Shadowcast uses visually indistinguishable poison samples that are crafted from clean images and their corresponding text descriptions. These samples are designed to be imperceptible to humans but effective in altering VLM responses. The attack is effective with as few as 50 poison samples and remains effective across various prompts and VLM architectures. Shadowcast is also robust against data augmentation and image compression techniques. The study demonstrates that poisoned VLMs can generate convincing yet deceptive misinformation, highlighting the importance of data quality for responsible deployment of VLMs. The paper evaluates Shadowcast on two benchmarks, GQA and VizWiz, showing that poisoned models maintain similar performance to clean models. The attack is successful in both Label Attack and Persuasion Attack scenarios, with high success rates even when using a small number of poison samples. Human evaluation confirms that poisoned models generate coherent and persuasive responses, effectively altering user perceptions. The study also shows that Shadowcast is transferable across different VLM architectures and remains effective in black-box settings. The results underscore the significant risks of data poisoning against VLMs and the need for robust data cleaning and defensive strategies. The paper concludes that data poisoning attacks on VLMs are a critical security concern, and further research is needed to develop effective defenses against such attacks.
Reach us at info@study.space