Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images

Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images

18 Mar 2024 | Zefeng Wang, Zhen Han, Shuo Chen, Fan Xue, Zifeng Ding, Xun Xiao, Volker Tresp, Philip Torr, Jindong Gu
This paper investigates the impact of Chain-of-Thought (CoT) reasoning on the adversarial robustness of Multimodal Large Language Models (MLLMs). CoT reasoning, which generates intermediate steps to explain model decisions, has been shown to enhance both performance and explainability. However, the robustness of MLLMs against adversarial images, which can severely degrade their performance, remains a significant concern. The study evaluates three attack methods: Answer Attack, Rationale Attack, and Stop-Reasoning Attack, to assess the effectiveness of CoT in mitigating these attacks. Key findings include: - CoT marginally improves adversarial robustness against existing attacks, particularly the Answer Attack and Rationale Attack. - The Stop-Reasoning Attack, which interrupts the CoT reasoning process, is highly effective in bypassing the enhanced robustness provided by CoT. - CoT reasoning introduces explainability by providing intermediate steps that can be used to understand why MLLMs make incorrect predictions under adversarial conditions. The paper also highlights the limitations of the attacks, which rely on first-order gradients and white-box models, and suggests future research directions to improve the robustness of MLLMs against adversarial attacks. Overall, the study underscores the need for a nuanced understanding of the interplay between reasoning processes and robustness in multimodal models.This paper investigates the impact of Chain-of-Thought (CoT) reasoning on the adversarial robustness of Multimodal Large Language Models (MLLMs). CoT reasoning, which generates intermediate steps to explain model decisions, has been shown to enhance both performance and explainability. However, the robustness of MLLMs against adversarial images, which can severely degrade their performance, remains a significant concern. The study evaluates three attack methods: Answer Attack, Rationale Attack, and Stop-Reasoning Attack, to assess the effectiveness of CoT in mitigating these attacks. Key findings include: - CoT marginally improves adversarial robustness against existing attacks, particularly the Answer Attack and Rationale Attack. - The Stop-Reasoning Attack, which interrupts the CoT reasoning process, is highly effective in bypassing the enhanced robustness provided by CoT. - CoT reasoning introduces explainability by providing intermediate steps that can be used to understand why MLLMs make incorrect predictions under adversarial conditions. The paper also highlights the limitations of the attacks, which rely on first-order gradients and white-box models, and suggests future research directions to improve the robustness of MLLMs against adversarial attacks. Overall, the study underscores the need for a nuanced understanding of the interplay between reasoning processes and robustness in multimodal models.
Reach us at info@study.space
[slides] Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image | StudySpace