[slides and audio] Red Teaming Visual Language Models

The paper introduces the Red Teaming Visual Language Model (RTVLM) dataset, which evaluates the performance of Vision-Language Models (VLMs) in handling red teaming scenarios involving multimodal inputs. RTVLM consists of 10 subtasks across four primary aspects: faithfulness, privacy, safety, and fairness. The dataset is designed to test VLMs' ability to generate accurate outputs, protect privacy, ensure safety, and maintain fairness. The authors find that prominent open-sourced VLMs struggle with red teaming tasks, showing up to a 31% performance gap compared to GPT-4V. They also demonstrate that applying red teaming alignment to LLaVA-v1.5 using RTVLM data enhances its performance by 10% on the RTVLM test set and 13% on MM-hallu, without significant decline in MM-Bench performance. The study highlights the need for better red teaming alignment in VLMs and provides insights for future improvements. The code and datasets will be open-sourced.The paper introduces the Red Teaming Visual Language Model (RTVLM) dataset, which evaluates the performance of Vision-Language Models (VLMs) in handling red teaming scenarios involving multimodal inputs. RTVLM consists of 10 subtasks across four primary aspects: faithfulness, privacy, safety, and fairness. The dataset is designed to test VLMs' ability to generate accurate outputs, protect privacy, ensure safety, and maintain fairness. The authors find that prominent open-sourced VLMs struggle with red teaming tasks, showing up to a 31% performance gap compared to GPT-4V. They also demonstrate that applying red teaming alignment to LLaVA-v1.5 using RTVLM data enhances its performance by 10% on the RTVLM test set and 13% on MM-hallu, without significant decline in MM-Bench performance. The study highlights the need for better red teaming alignment in VLMs and provides insights for future improvements. The code and datasets will be open-sourced.

Red Teaming Visual Language Models

23 Jan 2024 | Mukai Li, Lei Li, Yuwei Yin, Masood Ahmed, Zhenguang Liu, Qi Liu