Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

15 Jul 2024 | Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James T. Kwok, Yu Zhang
The paper "Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation" addresses the issue of safety vulnerabilities in multimodal large language models (MLLMs). Despite their impressive reasoning abilities, MLLMs are susceptible to jailbreak attacks, particularly when image inputs are introduced. The authors propose ECSO (Eyes Closed, Safety On), a novel training-free approach that leverages the intrinsic safety mechanisms of pre-aligned LLMs to enhance the safety of MLLMs. ECSO works by first assessing the safety of the model's initial response to a query with an image. If the response is deemed unsafe, ECSO converts the image into text using a query-aware image-to-text (I2T) transformation, reducing the MLLM to a text-only LLM. The MLLM then generates a safe response without images, restoring the safety mechanism. Experiments on five state-of-the-art MLLMs demonstrate significant improvements in safety (e.g., 37.6% improvement on MM-SafetyBench and 71.3% on VLSafe) while maintaining utility on common benchmarks. Additionally, ECSO can serve as a data engine for generating supervised fine-tuning (SFT) data for MLLM alignment without additional human intervention. The paper also includes a detailed analysis of the safety awareness of MLLMs and an evaluation of the effectiveness of ECSO through various experiments.The paper "Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation" addresses the issue of safety vulnerabilities in multimodal large language models (MLLMs). Despite their impressive reasoning abilities, MLLMs are susceptible to jailbreak attacks, particularly when image inputs are introduced. The authors propose ECSO (Eyes Closed, Safety On), a novel training-free approach that leverages the intrinsic safety mechanisms of pre-aligned LLMs to enhance the safety of MLLMs. ECSO works by first assessing the safety of the model's initial response to a query with an image. If the response is deemed unsafe, ECSO converts the image into text using a query-aware image-to-text (I2T) transformation, reducing the MLLM to a text-only LLM. The MLLM then generates a safe response without images, restoring the safety mechanism. Experiments on five state-of-the-art MLLMs demonstrate significant improvements in safety (e.g., 37.6% improvement on MM-SafetyBench and 71.3% on VLSafe) while maintaining utility on common benchmarks. Additionally, ECSO can serve as a data engine for generating supervised fine-tuning (SFT) data for MLLM alignment without additional human intervention. The paper also includes a detailed analysis of the safety awareness of MLLMs and an evaluation of the effectiveness of ECSO through various experiments.
Reach us at info@study.space