Safety of Multimodal Large Language Models on Images and Texts

Safety of Multimodal Large Language Models on Images and Texts

20 Jun 2024 | Xin Liu, Yichen Zhu, Yunshi Lan, Chao Yang, Yu Qiao
This paper presents a comprehensive survey of the safety of Multimodal Large Language Models (MLLMs) on images and texts. The authors analyze the current state of research on evaluating, attacking, and defending against safety risks in MLLMs. They begin by introducing the overview of MLLMs and the concept of safety, followed by a review of evaluation datasets and metrics for measuring MLLM safety. They then present attack and defense techniques related to MLLM safety, and finally discuss unsolved issues and promising research directions. MLLMs are capable of handling both text and images, and have attracted significant attention due to their multimodal potential. However, they are vulnerable to unsafe instructions, which can lead to harmful outputs. The authors identify three main risks associated with the visual modality: (1) adversarial perturbations can be used to create effective attacks with low cost; (2) MLLMs based on aligned LLMs may obey visual instructions when they have OCR capabilities; (3) cross-modal training can weaken the alignment ability of aligned LLMs. The authors summarize the research progress from three perspectives: evaluation, attack, and defense. They compare different safety evaluation datasets and metrics, and demonstrate a systematic review of attack and defense approaches. They also anticipate future research opportunities for MLLM safety to provide inspiration for other researchers. The paper discusses various attack methods, including adversarial attacks and visual prompt injection. Adversarial attacks involve adding small perturbations to images to manipulate the model's output. Visual prompt injection involves adding malicious text directly to an image to influence the model's response. The authors also discuss defense techniques, including inference-time and training-time alignment methods. The paper concludes that there are still many challenges in ensuring the safety of MLLMs, and that further research is needed to address these issues. The authors suggest that future research should focus on improving safety evaluation, in-depth study of safety risks, and safety alignment. They also emphasize the importance of balancing safety and utility in MLLMs.This paper presents a comprehensive survey of the safety of Multimodal Large Language Models (MLLMs) on images and texts. The authors analyze the current state of research on evaluating, attacking, and defending against safety risks in MLLMs. They begin by introducing the overview of MLLMs and the concept of safety, followed by a review of evaluation datasets and metrics for measuring MLLM safety. They then present attack and defense techniques related to MLLM safety, and finally discuss unsolved issues and promising research directions. MLLMs are capable of handling both text and images, and have attracted significant attention due to their multimodal potential. However, they are vulnerable to unsafe instructions, which can lead to harmful outputs. The authors identify three main risks associated with the visual modality: (1) adversarial perturbations can be used to create effective attacks with low cost; (2) MLLMs based on aligned LLMs may obey visual instructions when they have OCR capabilities; (3) cross-modal training can weaken the alignment ability of aligned LLMs. The authors summarize the research progress from three perspectives: evaluation, attack, and defense. They compare different safety evaluation datasets and metrics, and demonstrate a systematic review of attack and defense approaches. They also anticipate future research opportunities for MLLM safety to provide inspiration for other researchers. The paper discusses various attack methods, including adversarial attacks and visual prompt injection. Adversarial attacks involve adding small perturbations to images to manipulate the model's output. Visual prompt injection involves adding malicious text directly to an image to influence the model's response. The authors also discuss defense techniques, including inference-time and training-time alignment methods. The paper concludes that there are still many challenges in ensuring the safety of MLLMs, and that further research is needed to address these issues. The authors suggest that future research should focus on improving safety evaluation, in-depth study of safety risks, and safety alignment. They also emphasize the importance of balancing safety and utility in MLLMs.
Reach us at info@study.space
Understanding Safety of Multimodal Large Language Models on Images and Text