Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images

Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images

18 Mar 2024 | Zefeng Wang, Zhen Han, Shuo Chen, Fan Xue, Zifeng Ding, Xun Xiao, Volker Tresp, Philip Torr, Jindong Gu
This paper investigates the adversarial robustness of multimodal large language models (MLLMs) when employing chain-of-thought (CoT) reasoning. The study evaluates how CoT reasoning affects the robustness of MLLMs against adversarial attacks and introduces a novel stop-reasoning attack that effectively bypasses the CoT-induced robustness. The research finds that while CoT reasoning marginally improves adversarial robustness against existing attack methods, it is not sufficient to protect MLLMs from adversarial images. The stop-reasoning attack is shown to be highly effective in disrupting the CoT reasoning process, leading to incorrect predictions. The study also reveals that when MLLMs encounter adversarial images, their CoT reasoning processes are altered, providing insights into their reasoning under adversarial attacks. The findings highlight the importance of understanding the rationale behind model predictions and the potential vulnerabilities of CoT reasoning in adversarial scenarios. The research contributes to the field by demonstrating the limitations of CoT reasoning in enhancing robustness and introducing a new attack method that can undermine the benefits of CoT reasoning. The study also emphasizes the need for further research into defending against adversarial attacks, particularly in the context of CoT reasoning.This paper investigates the adversarial robustness of multimodal large language models (MLLMs) when employing chain-of-thought (CoT) reasoning. The study evaluates how CoT reasoning affects the robustness of MLLMs against adversarial attacks and introduces a novel stop-reasoning attack that effectively bypasses the CoT-induced robustness. The research finds that while CoT reasoning marginally improves adversarial robustness against existing attack methods, it is not sufficient to protect MLLMs from adversarial images. The stop-reasoning attack is shown to be highly effective in disrupting the CoT reasoning process, leading to incorrect predictions. The study also reveals that when MLLMs encounter adversarial images, their CoT reasoning processes are altered, providing insights into their reasoning under adversarial attacks. The findings highlight the importance of understanding the rationale behind model predictions and the potential vulnerabilities of CoT reasoning in adversarial scenarios. The research contributes to the field by demonstrating the limitations of CoT reasoning in enhancing robustness and introducing a new attack method that can undermine the benefits of CoT reasoning. The study also emphasizes the need for further research into defending against adversarial attacks, particularly in the context of CoT reasoning.
Reach us at info@study.space