Cross-Modal Safety Alignment: Is textual unlearning all you need?

Cross-Modal Safety Alignment: Is textual unlearning all you need?

27 May 2024 | Trishna Chakraborty, Erfan Shayegani, Zikui Cai, Nael Abu-Ghazaleh, M. Salman Asif, Yue Dong, Amit K. Roy-Chowdhury, Chengyu Song
This paper investigates whether textual unlearning alone is sufficient for cross-modal safety alignment in Vision-Language Models (VLMs). The study addresses the challenge of ensuring that VLMs generate harmless outputs across different modalities, such as text and vision, while maintaining their utility. The authors evaluate the effectiveness of textual unlearning in reducing the Attack Success Rate (ASR) for both text-based and vision-text-based attacks, showing that it significantly lowers ASR to less than 8%, and in some cases, as low as nearly 2%. The results indicate that textual unlearning preserves the model's utility while effectively reducing harmful outputs. The paper also compares textual unlearning with multi-modal unlearning and supervised fine-tuning (SFT). It finds that multi-modal unlearning does not offer significant advantages over textual unlearning and requires significantly more computational resources, up to six times higher. Additionally, multi-modal unlearning is less effective in reducing harmful outputs compared to textual unlearning. The study concludes that textual unlearning is more computationally efficient and effective for cross-modal safety alignment. The authors propose a method for textual unlearning that focuses on modifying the language model component of the VLM to avoid generating harmful content. This approach involves three loss terms: one to minimize harmful output generation, one to maximize the generation of helpful responses to harmful inputs, and one to maintain the generation of useful outputs for normal inputs. The results show that textual unlearning outperforms both multi-modal unlearning and SFT in terms of effectiveness and computational efficiency. The study also highlights the environmental impact of different unlearning approaches, finding that textual unlearning has a significantly lower carbon footprint compared to multi-modal unlearning and SFT. This makes textual unlearning a more sustainable option for cross-modal safety alignment. Overall, the paper concludes that textual unlearning is a promising approach for achieving high levels of harmlessness in VLMs while maintaining their utility.This paper investigates whether textual unlearning alone is sufficient for cross-modal safety alignment in Vision-Language Models (VLMs). The study addresses the challenge of ensuring that VLMs generate harmless outputs across different modalities, such as text and vision, while maintaining their utility. The authors evaluate the effectiveness of textual unlearning in reducing the Attack Success Rate (ASR) for both text-based and vision-text-based attacks, showing that it significantly lowers ASR to less than 8%, and in some cases, as low as nearly 2%. The results indicate that textual unlearning preserves the model's utility while effectively reducing harmful outputs. The paper also compares textual unlearning with multi-modal unlearning and supervised fine-tuning (SFT). It finds that multi-modal unlearning does not offer significant advantages over textual unlearning and requires significantly more computational resources, up to six times higher. Additionally, multi-modal unlearning is less effective in reducing harmful outputs compared to textual unlearning. The study concludes that textual unlearning is more computationally efficient and effective for cross-modal safety alignment. The authors propose a method for textual unlearning that focuses on modifying the language model component of the VLM to avoid generating harmful content. This approach involves three loss terms: one to minimize harmful output generation, one to maximize the generation of helpful responses to harmful inputs, and one to maintain the generation of useful outputs for normal inputs. The results show that textual unlearning outperforms both multi-modal unlearning and SFT in terms of effectiveness and computational efficiency. The study also highlights the environmental impact of different unlearning approaches, finding that textual unlearning has a significantly lower carbon footprint compared to multi-modal unlearning and SFT. This makes textual unlearning a more sustainable option for cross-modal safety alignment. Overall, the paper concludes that textual unlearning is a promising approach for achieving high levels of harmlessness in VLMs while maintaining their utility.
Reach us at info@study.space
[slides] Cross-Modal Safety Alignment%3A Is textual unlearning all you need%3F | StudySpace