AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI

AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI

23 Jan 2024 | Fanda Fan, Chunjie Luo, Wanling Gao, Jianfeng Zhan
AIGCBench is a comprehensive and scalable benchmark designed to evaluate a variety of video generation tasks, with a primary focus on Image-to-Video (I2V) generation. The benchmark addresses the limitations of existing benchmarks by including a diverse and open-domain image-text dataset that evaluates different state-of-the-art algorithms under equivalent conditions. AIGCBench employs a novel text combiner and GPT-4 to create rich text prompts, which are then used to generate images via advanced Text-to-Image models. The benchmark includes 11 metrics spanning four dimensions: control-video alignment, motion effects, temporal consistency, and video quality. These metrics are both reference video-based and video-free, ensuring a comprehensive evaluation strategy. The evaluation standard proposed correlates well with human judgment, providing insights into the strengths and weaknesses of current I2V algorithms. AIGCBench represents a significant step toward creating standardized benchmarks for the broader AIGC landscape, proposing an adaptable and equitable framework for future assessments of video generation tasks. The dataset and evaluation code are open-sourced on the project website: https://www.benchcouncil.org/AIGCBench. The benchmark includes three modules: the evaluation dataset, the evaluation metrics, and the video generation models to be assessed. AIGCBench encompasses two types of datasets: video-text and image-text datasets. To construct a more comprehensive evaluation dataset, the image-text dataset is expanded using a generation pipeline. Additionally, for a thorough evaluation of video generation models, a set of evaluation metrics comprising 11 metrics across four dimensions is introduced. These metrics include both reference video-based and reference video-free metrics, making full use of the benchmark proposed. Human validation is used to confirm the rationality of the evaluation standards proposed. AIGCBench evaluates four critical dimensions: control-video alignment, motion effects, temporal consistency, and video quality, thereby capturing every aspect of video generation. This integrated framework combines metrics that are both reference video-based and video-free metrics, enhancing the benchmark's rigor without exclusively relying on video-text datasets or image-text datasets alone. The experimental results demonstrate that the evaluation standard correlates well with human ratings, confirming its effectiveness. The benchmark provides a comprehensive framework for evaluating video generation tasks, including both video-text and image-text datasets. The evaluation metrics are designed to cover four aspects: control-video alignment, motion effects, temporal consistency, and video quality. The benchmark also includes a detailed analysis of the performance of different I2V algorithms, highlighting their strengths and weaknesses. The findings from the experiments suggest that existing solutions have significant room for improvement, and that integrating AIGCBench's evaluation criteria into the development process could lead to algorithms that better align with human preferences. The benchmark also addresses the limitations of current I2V algorithms, such as the inability to generate long videos and the slow inference speed. The benchmark provides a scalable and precise assessment methodology, setting the stage for continuous enhancements and innovations in this rapidly evolving research field. AIGAIGCBench is a comprehensive and scalable benchmark designed to evaluate a variety of video generation tasks, with a primary focus on Image-to-Video (I2V) generation. The benchmark addresses the limitations of existing benchmarks by including a diverse and open-domain image-text dataset that evaluates different state-of-the-art algorithms under equivalent conditions. AIGCBench employs a novel text combiner and GPT-4 to create rich text prompts, which are then used to generate images via advanced Text-to-Image models. The benchmark includes 11 metrics spanning four dimensions: control-video alignment, motion effects, temporal consistency, and video quality. These metrics are both reference video-based and video-free, ensuring a comprehensive evaluation strategy. The evaluation standard proposed correlates well with human judgment, providing insights into the strengths and weaknesses of current I2V algorithms. AIGCBench represents a significant step toward creating standardized benchmarks for the broader AIGC landscape, proposing an adaptable and equitable framework for future assessments of video generation tasks. The dataset and evaluation code are open-sourced on the project website: https://www.benchcouncil.org/AIGCBench. The benchmark includes three modules: the evaluation dataset, the evaluation metrics, and the video generation models to be assessed. AIGCBench encompasses two types of datasets: video-text and image-text datasets. To construct a more comprehensive evaluation dataset, the image-text dataset is expanded using a generation pipeline. Additionally, for a thorough evaluation of video generation models, a set of evaluation metrics comprising 11 metrics across four dimensions is introduced. These metrics include both reference video-based and reference video-free metrics, making full use of the benchmark proposed. Human validation is used to confirm the rationality of the evaluation standards proposed. AIGCBench evaluates four critical dimensions: control-video alignment, motion effects, temporal consistency, and video quality, thereby capturing every aspect of video generation. This integrated framework combines metrics that are both reference video-based and video-free metrics, enhancing the benchmark's rigor without exclusively relying on video-text datasets or image-text datasets alone. The experimental results demonstrate that the evaluation standard correlates well with human ratings, confirming its effectiveness. The benchmark provides a comprehensive framework for evaluating video generation tasks, including both video-text and image-text datasets. The evaluation metrics are designed to cover four aspects: control-video alignment, motion effects, temporal consistency, and video quality. The benchmark also includes a detailed analysis of the performance of different I2V algorithms, highlighting their strengths and weaknesses. The findings from the experiments suggest that existing solutions have significant room for improvement, and that integrating AIGCBench's evaluation criteria into the development process could lead to algorithms that better align with human preferences. The benchmark also addresses the limitations of current I2V algorithms, such as the inability to generate long videos and the slow inference speed. The benchmark provides a scalable and precise assessment methodology, setting the stage for continuous enhancements and innovations in this rapidly evolving research field. AIG
Reach us at info@study.space