3 Jul 2024 | Zonghao Ying, Aishan Liu, Xianglong Liu, and Dacheng Tao
This paper presents an empirical study on the safety of GPT-4o, evaluating its resilience against jailbreak attacks across three modalities: text, speech, and image. The study uses a series of jailbreak attacks on four benchmark datasets, involving over 4,000 initial text queries and nearly 8,000 responses. The results reveal that GPT-4o has enhanced safety in text modality jailbreaks compared to previous versions like GPT-4V. However, text-based jailbreak attacks show strong transferability, effectively compromising multimodal models like GPT-4o. The introduction of the audio modality opens new attack vectors for jailbreak attacks on GPT-4o. Existing black-box multimodal jailbreak methods are largely ineffective against GPT-4o and GPT-4V. The study also finds that attacks based on known jailbreak templates are comparatively ineffective, indicating OpenAI's proactive efforts in mitigating well-known jailbreak patterns.
The paper evaluates GPT-4o against unimodal jailbreak attacks, including text and audio modalities. In text modality, GPT-4o shows higher safety compared to GPT-4V, but text-based jailbreak attacks have transferability. In audio modality, GPT-4o demonstrates sufficient safety, as audio-based jailbreak attacks are less effective. The study also evaluates multimodal jailbreak attacks, finding that GPT-4o is more vulnerable to such attacks compared to GPT-4V. The results indicate that current multimodal jailbreak methods are ineffective against GPT-4o and GPT-4V. The study highlights the need for robust alignment guardrails in large models to ensure safety. The code for this study is available at https://github.com/NY1024/Jailbreak_GPT4o.This paper presents an empirical study on the safety of GPT-4o, evaluating its resilience against jailbreak attacks across three modalities: text, speech, and image. The study uses a series of jailbreak attacks on four benchmark datasets, involving over 4,000 initial text queries and nearly 8,000 responses. The results reveal that GPT-4o has enhanced safety in text modality jailbreaks compared to previous versions like GPT-4V. However, text-based jailbreak attacks show strong transferability, effectively compromising multimodal models like GPT-4o. The introduction of the audio modality opens new attack vectors for jailbreak attacks on GPT-4o. Existing black-box multimodal jailbreak methods are largely ineffective against GPT-4o and GPT-4V. The study also finds that attacks based on known jailbreak templates are comparatively ineffective, indicating OpenAI's proactive efforts in mitigating well-known jailbreak patterns.
The paper evaluates GPT-4o against unimodal jailbreak attacks, including text and audio modalities. In text modality, GPT-4o shows higher safety compared to GPT-4V, but text-based jailbreak attacks have transferability. In audio modality, GPT-4o demonstrates sufficient safety, as audio-based jailbreak attacks are less effective. The study also evaluates multimodal jailbreak attacks, finding that GPT-4o is more vulnerable to such attacks compared to GPT-4V. The results indicate that current multimodal jailbreak methods are ineffective against GPT-4o and GPT-4V. The study highlights the need for robust alignment guardrails in large models to ensure safety. The code for this study is available at https://github.com/NY1024/Jailbreak_GPT4o.