A Contemporary Survey on Deepfake Detection: Datasets, Algorithms, and Challenges

A Contemporary Survey on Deepfake Detection: Datasets, Algorithms, and Challenges

31 January 2024 | Liang Yu Gong, Xue Jun Li
This survey provides a comprehensive overview of deepfake detection methods and datasets from 2019 to 2023. It categorizes these methods into four categories based on feature extraction methods and network architectures: traditional CNN-based detection, CNN backbone with semi-supervised detection, transformer-based detection, and biological signal detection. The survey also evaluates several representative deepfake detection datasets, including FaceForensics++, DFDC, and Celeb-DF V2, highlighting their advantages and disadvantages. Additionally, it compares the performance of state-of-the-art detection models using different evaluating metrics, such as accuracy, AUC, and EER, and finds that cross-dataset evaluation significantly degrades accuracy. The survey concludes with three main findings: (1) traditional CNN-based methods struggle with generalization on unseen data due to varying data qualities and more realistic manipulations; (2) semi-supervised methods, while using CNN backbones, focus more on calculating representation similarity; and (3) transformer-based methods, particularly video transformers, show promise in capturing spatial and temporal information, leading to better generalization and performance.This survey provides a comprehensive overview of deepfake detection methods and datasets from 2019 to 2023. It categorizes these methods into four categories based on feature extraction methods and network architectures: traditional CNN-based detection, CNN backbone with semi-supervised detection, transformer-based detection, and biological signal detection. The survey also evaluates several representative deepfake detection datasets, including FaceForensics++, DFDC, and Celeb-DF V2, highlighting their advantages and disadvantages. Additionally, it compares the performance of state-of-the-art detection models using different evaluating metrics, such as accuracy, AUC, and EER, and finds that cross-dataset evaluation significantly degrades accuracy. The survey concludes with three main findings: (1) traditional CNN-based methods struggle with generalization on unseen data due to varying data qualities and more realistic manipulations; (2) semi-supervised methods, while using CNN backbones, focus more on calculating representation similarity; and (3) transformer-based methods, particularly video transformers, show promise in capturing spatial and temporal information, leading to better generalization and performance.
Reach us at info@study.space