Red Teaming Visual Language Models

Red Teaming Visual Language Models

23 Jan 2024 | Mukai Li, Lei Li, Yuwei Yin, Masood Ahmed, Zhenguang Liu, Qi Liu
This paper introduces the Red Teaming Visual Language Model (RTVLM) dataset, which is the first benchmark for evaluating the vulnerabilities of Vision-Language Models (VLMs) in red teaming scenarios. The dataset includes 10 subtasks across four aspects: faithfulness, privacy, safety, and fairness. These subtasks are designed to test how VLMs handle misleading text and images, privacy-related queries, safety concerns, and fairness issues. The dataset contains 5,200 samples, with each sample consisting of an image, a red teaming question, and a reference answer. The dataset is constructed using publicly available images or images generated by diffusion models, and the red teaming questions are either annotated by humans or generated by GPT-4 based on human-written seed examples. The authors evaluate the performance of 10 prominent open-sourced VLMs and GPT-4V on the RTVLM dataset. The results show that these models struggle with red teaming tasks, with up to a 31% performance gap compared to GPT-4V. Additionally, the study demonstrates that applying red teaming alignment to LLaVA-v1.5 using RTVLM improves the model's performance on the RTVLM test set by 10%, on the MM-hallu benchmark by 13%, and maintains stable performance on MM-Bench. This indicates that current open-sourced VLMs lack red teaming alignment. The paper also discusses the importance of red teaming for VLMs, highlighting the need for comprehensive and systematic benchmarks to evaluate their safety, privacy, fairness, and other critical aspects. The authors propose that using RTVLM as a red teaming alignment dataset can enhance the safety and robustness of VLMs without significantly affecting their performance on downstream tasks. The dataset and code will be open-sourced for further research and development.This paper introduces the Red Teaming Visual Language Model (RTVLM) dataset, which is the first benchmark for evaluating the vulnerabilities of Vision-Language Models (VLMs) in red teaming scenarios. The dataset includes 10 subtasks across four aspects: faithfulness, privacy, safety, and fairness. These subtasks are designed to test how VLMs handle misleading text and images, privacy-related queries, safety concerns, and fairness issues. The dataset contains 5,200 samples, with each sample consisting of an image, a red teaming question, and a reference answer. The dataset is constructed using publicly available images or images generated by diffusion models, and the red teaming questions are either annotated by humans or generated by GPT-4 based on human-written seed examples. The authors evaluate the performance of 10 prominent open-sourced VLMs and GPT-4V on the RTVLM dataset. The results show that these models struggle with red teaming tasks, with up to a 31% performance gap compared to GPT-4V. Additionally, the study demonstrates that applying red teaming alignment to LLaVA-v1.5 using RTVLM improves the model's performance on the RTVLM test set by 10%, on the MM-hallu benchmark by 13%, and maintains stable performance on MM-Bench. This indicates that current open-sourced VLMs lack red teaming alignment. The paper also discusses the importance of red teaming for VLMs, highlighting the need for comprehensive and systematic benchmarks to evaluate their safety, privacy, fairness, and other critical aspects. The authors propose that using RTVLM as a red teaming alignment dataset can enhance the safety and robustness of VLMs without significantly affecting their performance on downstream tasks. The dataset and code will be open-sourced for further research and development.
Reach us at info@study.space