Detecting Multimedia Generated by Large AI Models: A Survey

Detecting Multimedia Generated by Large AI Models: A Survey

14 May 2025 | Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun-Hao Liu, Feng Ding, Xin Wang, Senior Member, IEEE, Xin Li, Fellow, IEEE, Luisa Verdoliva, Fellow, IEEE, Shu Hu*, Member, IEEE
The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has led to an increase in AI-generated multimedia content, which presents significant risks such as misuse, societal disruptions, and ethical concerns. This survey aims to address the gap in systematic research on detecting LAIM-generated multimedia. It provides a comprehensive review of existing detection methods, introduces a novel taxonomy categorized by media modality and aligned with two perspectives: *pure detection* (enhancing detection performance) and *beyond detection* (addition of attributes like generalizability, robustness, and interpretability). The survey covers text, images, videos, audio, and multimodal content, offering insights into generation mechanisms, public datasets, online detection tools, and evaluation metrics. It also highlights the broader societal impact of LAIM-generated multimedia on social media and identifies current challenges and future research directions. The survey is intended to contribute to global AI security efforts and ensure the integrity of digital information.The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has led to an increase in AI-generated multimedia content, which presents significant risks such as misuse, societal disruptions, and ethical concerns. This survey aims to address the gap in systematic research on detecting LAIM-generated multimedia. It provides a comprehensive review of existing detection methods, introduces a novel taxonomy categorized by media modality and aligned with two perspectives: *pure detection* (enhancing detection performance) and *beyond detection* (addition of attributes like generalizability, robustness, and interpretability). The survey covers text, images, videos, audio, and multimodal content, offering insights into generation mechanisms, public datasets, online detection tools, and evaluation metrics. It also highlights the broader societal impact of LAIM-generated multimedia on social media and identifies current challenges and future research directions. The survey is intended to contribute to global AI security efforts and ensure the integrity of digital information.
Reach us at info@study.space