14 May 2025 | Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun-Hao Liu, Feng Ding, Xin Wang, Xin Li, Luisa Verdoliva, Shu Hu
This survey provides a comprehensive overview of detecting multimedia generated by Large AI Models (LAIMs), including text, images, videos, audio, and multimodal content. The rapid development of LAIMs, such as diffusion models and large language models, has led to increased integration of AI-generated multimedia into daily life, but also raises significant risks, including misuse, societal disruption, and ethical concerns. Detecting such content is therefore crucial, yet there is a lack of systematic surveys on this topic. This survey addresses this gap by presenting the first comprehensive review of existing research on LAIM-generated multimedia detection, introducing a novel taxonomy for detection methods categorized by media modality and aligned with two perspectives: pure detection (aiming to enhance detection performance) and beyond detection (adding attributes like generalizability, robustness, and interpretability to detectors). The survey also provides an overview of generation mechanisms, public datasets, online detection tools, and evaluation metrics, offering a valuable resource for researchers and practitioners. Additionally, it offers a focused analysis from a social media perspective to highlight the broader societal impact of LAIM-generated multimedia. The survey identifies current challenges in detection and proposes directions for future research, aiming to fill an academic gap and contribute to global AI security efforts. The survey is organized into sections on generation, detection, and related works, covering various aspects of LAIM-generated multimedia, including text, images, videos, audio, and multimodal content. The survey also discusses the challenges and future directions in detecting LAIM-generated multimedia, emphasizing the need for robust, socially grounded benchmarks and detection methods.This survey provides a comprehensive overview of detecting multimedia generated by Large AI Models (LAIMs), including text, images, videos, audio, and multimodal content. The rapid development of LAIMs, such as diffusion models and large language models, has led to increased integration of AI-generated multimedia into daily life, but also raises significant risks, including misuse, societal disruption, and ethical concerns. Detecting such content is therefore crucial, yet there is a lack of systematic surveys on this topic. This survey addresses this gap by presenting the first comprehensive review of existing research on LAIM-generated multimedia detection, introducing a novel taxonomy for detection methods categorized by media modality and aligned with two perspectives: pure detection (aiming to enhance detection performance) and beyond detection (adding attributes like generalizability, robustness, and interpretability to detectors). The survey also provides an overview of generation mechanisms, public datasets, online detection tools, and evaluation metrics, offering a valuable resource for researchers and practitioners. Additionally, it offers a focused analysis from a social media perspective to highlight the broader societal impact of LAIM-generated multimedia. The survey identifies current challenges in detection and proposes directions for future research, aiming to fill an academic gap and contribute to global AI security efforts. The survey is organized into sections on generation, detection, and related works, covering various aspects of LAIM-generated multimedia, including text, images, videos, audio, and multimodal content. The survey also discusses the challenges and future directions in detecting LAIM-generated multimedia, emphasizing the need for robust, socially grounded benchmarks and detection methods.