16 Mar 2020 | Yuezun Li1, Xin Yang1, Pu Sun2, Honggang Qi2 and Siwei Lyu1
Celeb-DF is a large-scale challenging dataset for DeepFake forensics, containing 5,639 high-quality DeepFake videos of celebrities generated using an improved synthesis process. The dataset aims to address the limitations of existing DeepFake datasets, which often have low visual quality and do not resemble DeepFake videos circulating online. The Celeb-DF dataset includes real videos of 59 celebrities with diverse gender, age, and ethnic backgrounds, and the DeepFake videos are generated using an enhanced synthesis method, resulting in significantly improved visual quality compared to existing datasets. The dataset is used to evaluate the performance of DeepFake detection methods, demonstrating that Celeb-DF is challenging for most existing detection methods, even though some methods achieve high accuracy on previous datasets. The dataset also includes a comprehensive evaluation of current DeepFake detection methods, showing that the second generation datasets (DFD, DFDC, and Celeb-DF) have higher difficulty levels than the first generation datasets (UADFV, DF-TIMIT, and FF-DF). The evaluation also shows that detection performance is affected by video compression, with some methods performing better on lower compression levels. The results indicate that there is still much room for improvement in DeepFake detection methods. The authors conclude that the Celeb-DF dataset is a valuable resource for the development and evaluation of DeepFake detection methods, and future work should focus on improving the visual quality of synthesized videos and incorporating anti-forensic techniques in the dataset.Celeb-DF is a large-scale challenging dataset for DeepFake forensics, containing 5,639 high-quality DeepFake videos of celebrities generated using an improved synthesis process. The dataset aims to address the limitations of existing DeepFake datasets, which often have low visual quality and do not resemble DeepFake videos circulating online. The Celeb-DF dataset includes real videos of 59 celebrities with diverse gender, age, and ethnic backgrounds, and the DeepFake videos are generated using an enhanced synthesis method, resulting in significantly improved visual quality compared to existing datasets. The dataset is used to evaluate the performance of DeepFake detection methods, demonstrating that Celeb-DF is challenging for most existing detection methods, even though some methods achieve high accuracy on previous datasets. The dataset also includes a comprehensive evaluation of current DeepFake detection methods, showing that the second generation datasets (DFD, DFDC, and Celeb-DF) have higher difficulty levels than the first generation datasets (UADFV, DF-TIMIT, and FF-DF). The evaluation also shows that detection performance is affected by video compression, with some methods performing better on lower compression levels. The results indicate that there is still much room for improvement in DeepFake detection methods. The authors conclude that the Celeb-DF dataset is a valuable resource for the development and evaluation of DeepFake detection methods, and future work should focus on improving the visual quality of synthesized videos and incorporating anti-forensic techniques in the dataset.