This paper presents a deep learning-based method for distinguishing DeepFake videos from real videos by detecting face warping artifacts. The method leverages the fact that DeepFake algorithms generate images of limited resolution, which must be warped to match the source face in the video. This warping process introduces distinct artifacts that can be effectively captured by convolutional neural networks (CNNs). Unlike previous methods that require large amounts of real and DeepFake data for training, this method generates negative examples by simulating the resolution inconsistency in affine face warping using simple image processing operations, saving time and resources. The method is robust across different sources of DeepFake videos due to the general presence of these artifacts.
The method involves detecting faces and landmarks, aligning faces to a standard configuration, applying Gaussian blurring, and then affine warping back to the original image. Negative examples are generated by aligning faces into multiple scales and applying Gaussian blur. The CNN models are trained using these examples, and the models include VGG16, ResNet50, ResNet101, and ResNet152. The method is evaluated on two datasets: UADFV and DeepfakeTIMIT, achieving high AUC performance. The ResNet50 model performs best, outperforming other methods by significant margins. The method is also tested on YouTube videos, demonstrating its effectiveness in detecting DeepFake videos. The results show that the method is effective in detecting the artifacts introduced by the affine face warping process, making it a robust solution for identifying DeepFake videos. The paper concludes that the method is effective and highlights the need for further improvements in detecting DeepFake videos under various conditions.This paper presents a deep learning-based method for distinguishing DeepFake videos from real videos by detecting face warping artifacts. The method leverages the fact that DeepFake algorithms generate images of limited resolution, which must be warped to match the source face in the video. This warping process introduces distinct artifacts that can be effectively captured by convolutional neural networks (CNNs). Unlike previous methods that require large amounts of real and DeepFake data for training, this method generates negative examples by simulating the resolution inconsistency in affine face warping using simple image processing operations, saving time and resources. The method is robust across different sources of DeepFake videos due to the general presence of these artifacts.
The method involves detecting faces and landmarks, aligning faces to a standard configuration, applying Gaussian blurring, and then affine warping back to the original image. Negative examples are generated by aligning faces into multiple scales and applying Gaussian blur. The CNN models are trained using these examples, and the models include VGG16, ResNet50, ResNet101, and ResNet152. The method is evaluated on two datasets: UADFV and DeepfakeTIMIT, achieving high AUC performance. The ResNet50 model performs best, outperforming other methods by significant margins. The method is also tested on YouTube videos, demonstrating its effectiveness in detecting DeepFake videos. The results show that the method is effective in detecting the artifacts introduced by the affine face warping process, making it a robust solution for identifying DeepFake videos. The paper concludes that the method is effective and highlights the need for further improvements in detecting DeepFake videos under various conditions.