This paper investigates the robustness of GPT-4 and GPT-4V against jailbreak attacks on both text and visual modalities. The authors construct a comprehensive jailbreak evaluation dataset with 1445 harmful questions covering 11 different safety policies. They conduct extensive red-teaming experiments on 11 different LLMs and MLLMs, including both proprietary and open-source models. The results show that GPT-4 and GPT-4V demonstrate better robustness against jailbreak attacks compared to open-source models. Among open-source models, Llama2 and Qwen-VL-Chat show better robustness, with Llama2 being more robust than GPT-4. Visual jailbreak methods have relatively limited transferability compared to textual methods. The study also finds that AutoDAN has better transferability than GCG. The results indicate that GPT-4 and GPT-4V are significantly more robust than open-source models, especially against visual jailbreak attacks. The study highlights the importance of safety alignment and fine-tuning in improving model robustness against jailbreak attacks. The authors also find that the current defense mechanisms of open-source models are not as effective as those of closed-source models. The study concludes that while GPT-4 and GPT-4V are more robust, they are not completely immune to jailbreak attacks. The results suggest that future research should focus on improving the robustness of open-source models and developing more effective defense mechanisms against jailbreak attacks.This paper investigates the robustness of GPT-4 and GPT-4V against jailbreak attacks on both text and visual modalities. The authors construct a comprehensive jailbreak evaluation dataset with 1445 harmful questions covering 11 different safety policies. They conduct extensive red-teaming experiments on 11 different LLMs and MLLMs, including both proprietary and open-source models. The results show that GPT-4 and GPT-4V demonstrate better robustness against jailbreak attacks compared to open-source models. Among open-source models, Llama2 and Qwen-VL-Chat show better robustness, with Llama2 being more robust than GPT-4. Visual jailbreak methods have relatively limited transferability compared to textual methods. The study also finds that AutoDAN has better transferability than GCG. The results indicate that GPT-4 and GPT-4V are significantly more robust than open-source models, especially against visual jailbreak attacks. The study highlights the importance of safety alignment and fine-tuning in improving model robustness against jailbreak attacks. The authors also find that the current defense mechanisms of open-source models are not as effective as those of closed-source models. The study concludes that while GPT-4 and GPT-4V are more robust, they are not completely immune to jailbreak attacks. The results suggest that future research should focus on improving the robustness of open-source models and developing more effective defense mechanisms against jailbreak attacks.