Jailbreaking Attack against Multimodal Large Language Model

Jailbreaking Attack against Multimodal Large Language Model

4 Feb 2024 | Zhenxing Niu * 1 Haodong Ren * 1 Xinbo Gao 1 Gang Hua 2 Rong Jin 3
This paper focuses on jailbreaking attacks against multi-modal large language models (MLLMs), aiming to elicit these models to generate objectionable responses to harmful user queries. The authors propose a maximum likelihood-based algorithm to find an *image Jailbreaking Prompt* (imgJP), enabling jailbreaks across multiple unseen prompts and images (data-universal property). The approach exhibits strong model-transferability, as the generated imgJP can be transferred to various models, including MiniGPT-v2, LLaVA, InstructBLIP, and mPLUG-Owl2, in a black-box manner. The paper also reveals a connection between MLLM-jailbreaks and LLM-jailbreaks, introducing a construction-based method to harness the approach for LLM-jailbreaks, demonstrating superior efficiency compared to current state-of-the-art methods. The code is available at [https://github.com/ZhenxingNiu/Jailbreaking-Attack](https://github.com/ZhenxingNiu/Jailbreaking-Attack).This paper focuses on jailbreaking attacks against multi-modal large language models (MLLMs), aiming to elicit these models to generate objectionable responses to harmful user queries. The authors propose a maximum likelihood-based algorithm to find an *image Jailbreaking Prompt* (imgJP), enabling jailbreaks across multiple unseen prompts and images (data-universal property). The approach exhibits strong model-transferability, as the generated imgJP can be transferred to various models, including MiniGPT-v2, LLaVA, InstructBLIP, and mPLUG-Owl2, in a black-box manner. The paper also reveals a connection between MLLM-jailbreaks and LLM-jailbreaks, introducing a construction-based method to harness the approach for LLM-jailbreaks, demonstrating superior efficiency compared to current state-of-the-art methods. The code is available at [https://github.com/ZhenxingNiu/Jailbreaking-Attack](https://github.com/ZhenxingNiu/Jailbreaking-Attack).
Reach us at info@study.space