GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse

GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse

28 Feb 2025 | Hongzhan Lin*, Ziyang Luo, Bo Wang, Ruichao Yang, Jing Ma†
GOAT-Bench is a comprehensive meme benchmark designed to evaluate the safety and reasoning capabilities of large multimodal models (LMMs) in detecting social abuse in memes. The benchmark includes over 6,626 memes covering themes such as hate speech, sexism, cyberbullying, sarcasm, and harmful content. The goal is to assess LMMs' ability to accurately identify and respond to these nuanced forms of abuse. The benchmark is publicly accessible at https://goatlmm.github.io/, contributing to ongoing research in this critical area. The study evaluates the performance of 11 LMMs, including GPT-4V, CogVLM, LLaVA-1.5, InstructBLIP, MiniGPT-4, Qwen-VL, OpenFlamingo, MMGPT, Fuyu, mPLUG-Owl, and MiniGPT-v2. Results show that GPT-4V achieves the highest overall performance with a macro-averaged F1 score of 70.29%, while other models score below 62%. The study also explores the effectiveness of various prompting strategies, including chain-of-thought (CoT), in-context learning (ICL), and supervised fine-tuning, in improving LMMs' ability to detect social abuse in memes. The findings indicate that current LMMs still exhibit deficiencies in safety awareness, showing insensitivity to implicit abuse. The study highlights the need for continued advancements in LMM safety and human alignment to enhance performance on complex tasks like those presented by GOAT-Bench. The benchmark also reveals that models with more parameters do not always perform better, and that the effectiveness of prompting strategies varies depending on the model and task. The study further demonstrates that LMMs struggle with cross-lingual detection and that the ability to detect sarcasm remains a challenge. The results underscore the importance of improving LMMs' understanding of humor and social abuse to ensure they can safely and effectively handle multimodal content.GOAT-Bench is a comprehensive meme benchmark designed to evaluate the safety and reasoning capabilities of large multimodal models (LMMs) in detecting social abuse in memes. The benchmark includes over 6,626 memes covering themes such as hate speech, sexism, cyberbullying, sarcasm, and harmful content. The goal is to assess LMMs' ability to accurately identify and respond to these nuanced forms of abuse. The benchmark is publicly accessible at https://goatlmm.github.io/, contributing to ongoing research in this critical area. The study evaluates the performance of 11 LMMs, including GPT-4V, CogVLM, LLaVA-1.5, InstructBLIP, MiniGPT-4, Qwen-VL, OpenFlamingo, MMGPT, Fuyu, mPLUG-Owl, and MiniGPT-v2. Results show that GPT-4V achieves the highest overall performance with a macro-averaged F1 score of 70.29%, while other models score below 62%. The study also explores the effectiveness of various prompting strategies, including chain-of-thought (CoT), in-context learning (ICL), and supervised fine-tuning, in improving LMMs' ability to detect social abuse in memes. The findings indicate that current LMMs still exhibit deficiencies in safety awareness, showing insensitivity to implicit abuse. The study highlights the need for continued advancements in LMM safety and human alignment to enhance performance on complex tasks like those presented by GOAT-Bench. The benchmark also reveals that models with more parameters do not always perform better, and that the effectiveness of prompting strategies varies depending on the model and task. The study further demonstrates that LMMs struggle with cross-lingual detection and that the ability to detect sarcasm remains a challenge. The results underscore the importance of improving LMMs' understanding of humor and social abuse to ensure they can safely and effectively handle multimodal content.
Reach us at info@study.space