16 Feb 2024 | Maan Qraitem, Nazia Tasnim, Piotr Teterwak, Kate Saenko, Bryan A. Plummer
Vision-language models (LVLMs) are vulnerable to typographic attacks, where misleading text is superimposed on images to mislead the model. This paper introduces a benchmark for evaluating typographic attacks against LVLMs and proposes two novel self-generated attack methods: Class-Based and Descriptive Attacks. Class-Based Attacks ask the LVLM to identify the most similar class to the target, while Descriptive Attacks ask the model to recommend a typographic attack that includes both a deceptive class and a description. These attacks are more effective than prior random class attacks, reducing LVLM classification performance by up to 33%. The study shows that attacks generated by one model (e.g., GPT-4V or LLaVA) are effective against the model itself and other models like InstructBLIP and MiniGPT4. The results indicate that descriptive attacks are more effective than class-based attacks, as they leverage the sophisticated language understanding capabilities of LVLMs. The paper also highlights the importance of addressing typographic attacks in LVLMs, as they can significantly impact model performance and safety. The findings suggest that future research should focus on improving the robustness of LVLMs against such attacks.Vision-language models (LVLMs) are vulnerable to typographic attacks, where misleading text is superimposed on images to mislead the model. This paper introduces a benchmark for evaluating typographic attacks against LVLMs and proposes two novel self-generated attack methods: Class-Based and Descriptive Attacks. Class-Based Attacks ask the LVLM to identify the most similar class to the target, while Descriptive Attacks ask the model to recommend a typographic attack that includes both a deceptive class and a description. These attacks are more effective than prior random class attacks, reducing LVLM classification performance by up to 33%. The study shows that attacks generated by one model (e.g., GPT-4V or LLaVA) are effective against the model itself and other models like InstructBLIP and MiniGPT4. The results indicate that descriptive attacks are more effective than class-based attacks, as they leverage the sophisticated language understanding capabilities of LVLMs. The paper also highlights the importance of addressing typographic attacks in LVLMs, as they can significantly impact model performance and safety. The findings suggest that future research should focus on improving the robustness of LVLMs against such attacks.