Understanding Red Teaming GPT-4V%3A Are GPT-4V Safe Against Uni%2FMulti-Modal Jailbreak Attacks%3F

This paper addresses the vulnerability of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) to jailbreak attacks, which can bypass safety measures and generate harmful content. The authors construct a comprehensive evaluation benchmark with 1445 harmful questions covering 11 different safety policies. They conduct extensive red-teaming experiments on 11 different LLMs and MLLMs, including both proprietary and open-source models. The results show that GPT-4 and GPT-4V demonstrate better robustness against both textual and visual jailbreak methods compared to open-source models. Specifically, Llama2 and Qwen-VL-Chat are more robust among open-source models. The study also finds that visual jailbreak methods have limited transferability compared to textual methods. The paper provides a detailed analysis of the robustness of different models and the effectiveness of various attack methods, contributing to a better understanding of the security of LLMs and MLLMs.This paper addresses the vulnerability of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) to jailbreak attacks, which can bypass safety measures and generate harmful content. The authors construct a comprehensive evaluation benchmark with 1445 harmful questions covering 11 different safety policies. They conduct extensive red-teaming experiments on 11 different LLMs and MLLMs, including both proprietary and open-source models. The results show that GPT-4 and GPT-4V demonstrate better robustness against both textual and visual jailbreak methods compared to open-source models. Specifically, Llama2 and Qwen-VL-Chat are more robust among open-source models. The study also finds that visual jailbreak methods have limited transferability compared to textual methods. The paper provides a detailed analysis of the robustness of different models and the effectiveness of various attack methods, contributing to a better understanding of the security of LLMs and MLLMs.

RED TEAMING GPT-4V: ARE GPT-4V SAFE AGAINST UNI/MULTI-MODAL JAILBREAK ATTACKS?

4 Apr 2024 | Shuo Chen1,3 Zhen Han1 Bailan He1,3 Zifeng Ding3 Wenqian Yu5 Philip Torr2 Volker Tresp1,4 Jindong Gu2*