[slides and audio] T2VSafetyBench%3A Evaluating the Safety of Text-to-Video Generative Models

The paper "T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models" addresses the growing concern about the security risks associated with text-to-video (T2V) generation models. The authors introduce T2VSafetyBench, a new benchmark designed to assess the safety of T2V models, focusing on 12 critical aspects of video generation safety, including pornography, violence, gore, public figures, discrimination, political sensitivity, illegal activities, disturbing content, misinformation, copyright infringement, and temporal risk. They construct a malicious text prompt dataset, which includes real-world prompts, LLM-generated prompts, and jailbreak attack-based prompts, and evaluate the models using GPT-4 and manual assessments. Key findings include: 1. **No Single Model excels in all aspects**: Different models have distinct strengths and weaknesses across various safety dimensions. 2. **High correlation between GPT-4 assessments and manual reviews**: The correlation coefficient between GPT-4 and human evaluations is generally high, supporting the use of GPT-4 for large-scale safety assessments. 3. **Trade-off between usability and safety**: Models with weaker comprehension and generation capabilities may have better safety in certain dimensions, but as model capabilities improve, safety risks are likely to increase. The paper highlights the importance of prioritizing video safety and calls for further research to address the unique challenges of T2V models, particularly in handling temporal risk and complex content. The authors aim to provide insights that can help improve the safety of video generation in the era of generative AI.The paper "T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models" addresses the growing concern about the security risks associated with text-to-video (T2V) generation models. The authors introduce T2VSafetyBench, a new benchmark designed to assess the safety of T2V models, focusing on 12 critical aspects of video generation safety, including pornography, violence, gore, public figures, discrimination, political sensitivity, illegal activities, disturbing content, misinformation, copyright infringement, and temporal risk. They construct a malicious text prompt dataset, which includes real-world prompts, LLM-generated prompts, and jailbreak attack-based prompts, and evaluate the models using GPT-4 and manual assessments. Key findings include: 1. **No Single Model excels in all aspects**: Different models have distinct strengths and weaknesses across various safety dimensions. 2. **High correlation between GPT-4 assessments and manual reviews**: The correlation coefficient between GPT-4 and human evaluations is generally high, supporting the use of GPT-4 for large-scale safety assessments. 3. **Trade-off between usability and safety**: Models with weaker comprehension and generation capabilities may have better safety in certain dimensions, but as model capabilities improve, safety risks are likely to increase. The paper highlights the importance of prioritizing video safety and calls for further research to address the unique challenges of T2V models, particularly in handling temporal risk and complex content. The authors aim to provide insights that can help improve the safety of video generation in the era of generative AI.

T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models

8 Sep 2024 | Yibo Miao, Yifan Zhu, Yinpeng Dong, Lijia Yu, Jun Zhu, Xiao-Shan Gao