MM-SOC: Benchmarking Multimodal Large Language Models in Social Media Platforms

MM-SOC: Benchmarking Multimodal Large Language Models in Social Media Platforms

24 Jul 2024 | Yiqiao Jin, Minje Choi, Gaurav Verma, Jindong Wang, Srijan Kumar
MM-SOC is a benchmark designed to evaluate Multimodal Large Language Models (MLLMs) in social media tasks. The benchmark includes ten tasks, such as misinformation detection, hate speech detection, and social context generation, and incorporates a new large-scale YouTube tagging dataset. The study evaluates ten open-source MLLMs, including LLaVA-v1.5, BLIP2, InstructBLIP, and LLaMA-Adapter-v2, in a zero-shot and fine-tuned setting. Results show that zero-shot MLLMs perform poorly, often matching or falling below random baselines, while fine-tuned models show significant improvements. LLaVA-v1.5 achieves the best performance in most tasks, particularly in text generation. The benchmark highlights the need for improvements in MLLMs' social understanding capabilities and suggests that fine-tuning with explanations can enhance performance. MM-SOC provides a comprehensive evaluation of MLLMs' abilities in handling social media content, revealing limitations in self-improvement and the effectiveness of fine-tuning. The study also discusses the ethical implications and broader impacts of MLLMs, emphasizing the importance of addressing biases and ensuring fairness in their deployment. The benchmark aims to guide future research and development in MLLMs for social media applications.MM-SOC is a benchmark designed to evaluate Multimodal Large Language Models (MLLMs) in social media tasks. The benchmark includes ten tasks, such as misinformation detection, hate speech detection, and social context generation, and incorporates a new large-scale YouTube tagging dataset. The study evaluates ten open-source MLLMs, including LLaVA-v1.5, BLIP2, InstructBLIP, and LLaMA-Adapter-v2, in a zero-shot and fine-tuned setting. Results show that zero-shot MLLMs perform poorly, often matching or falling below random baselines, while fine-tuned models show significant improvements. LLaVA-v1.5 achieves the best performance in most tasks, particularly in text generation. The benchmark highlights the need for improvements in MLLMs' social understanding capabilities and suggests that fine-tuning with explanations can enhance performance. MM-SOC provides a comprehensive evaluation of MLLMs' abilities in handling social media content, revealing limitations in self-improvement and the effectiveness of fine-tuning. The study also discusses the ethical implications and broader impacts of MLLMs, emphasizing the importance of addressing biases and ensuring fairness in their deployment. The benchmark aims to guide future research and development in MLLMs for social media applications.
Reach us at info@study.space
[slides] MM-Soc%3A Benchmarking Multimodal Large Language Models in Social Media Platforms | StudySpace