AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI

AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI

23 Jan 2024 | Fanda Fan, Chunjie Luo, Wanling Gao, Jianfeng Zhan
AIGCBench is a comprehensive and scalable benchmark designed to evaluate various video generation tasks, with a focus on Image-to-Video (I2V) generation. It addresses the limitations of existing benchmarks by including a diverse and open-domain image-text dataset to evaluate different state-of-the-art algorithms under equivalent conditions. AIGCBench employs a novel text combiner and GPT-4 to create rich text prompts, which are then used to generate images via advanced Text-to-Image models. The benchmark includes 11 metrics across four dimensions: control-video alignment, motion effects, temporal consistency, and video quality. These metrics are both reference video-based and video-free, ensuring a comprehensive evaluation strategy. The evaluation standard proposed correlates well with human judgment, providing insights into the strengths and weaknesses of current I2V algorithms. AIGCBench represents a significant step toward creating standardized benchmarks for the broader AIGC landscape, proposing an adaptable and equitable framework for future assessments of video generation tasks. The dataset and evaluation code are open-sourced on the project website: https://www.benchcouncil.org/AIGCBench. AIGCBench includes three modules: the evaluation dataset, the evaluation metrics, and the video generation models to be assessed. The benchmark encompasses two types of datasets: video-text and image-text datasets. To construct a more comprehensive evaluation dataset, the image-text dataset is expanded using the generation pipeline. Additionally, for a thorough evaluation of video generation models, a set of evaluation metrics comprising 11 metrics across four dimensions is introduced. These metrics include both reference video-based and reference video-free metrics, making full use of the benchmark proposed. Human validation is used to confirm the rationality of the evaluation standards proposed. AIGCBench enables the comparison of different algorithms under equivalent evaluation conditions, allowing for an analysis of the strengths and weaknesses of different state-of-the-art video generation algorithms. The first version of AIGCBench addresses the current lack of a reasonable benchmark for I2V tasks by providing a thorough evaluation for them. Subsequent versions plan to include more video generation tasks and place them under equivalent evaluation conditions for a fair comparison. AIGCBench is engineered to meet the diverse demands of users looking to animate a broad array of static images. It addresses the challenge of animating images such as a blue dragon skateboarding in Times Square by deploying a text combiner to generate a rich assortment of text prompts that span a multitude of subjects, behaviors, backgrounds, and artistic styles. GPT-4 is used to enhance the text prompts, rendering them more vivid and intricate. These detailed prompts guide the generation of images through state-of-the-art Text-to-Image diffusion models. By blending video-text and image-text datasets, AIGCBench ensures a robust and comprehensive evaluation of a range of I2V algorithms. The benchmark evaluates four critical dimensions: control-video alignment, motion effects, temporal consistency, and video quality. This integrated framework combines metrics thatAIGCBench is a comprehensive and scalable benchmark designed to evaluate various video generation tasks, with a focus on Image-to-Video (I2V) generation. It addresses the limitations of existing benchmarks by including a diverse and open-domain image-text dataset to evaluate different state-of-the-art algorithms under equivalent conditions. AIGCBench employs a novel text combiner and GPT-4 to create rich text prompts, which are then used to generate images via advanced Text-to-Image models. The benchmark includes 11 metrics across four dimensions: control-video alignment, motion effects, temporal consistency, and video quality. These metrics are both reference video-based and video-free, ensuring a comprehensive evaluation strategy. The evaluation standard proposed correlates well with human judgment, providing insights into the strengths and weaknesses of current I2V algorithms. AIGCBench represents a significant step toward creating standardized benchmarks for the broader AIGC landscape, proposing an adaptable and equitable framework for future assessments of video generation tasks. The dataset and evaluation code are open-sourced on the project website: https://www.benchcouncil.org/AIGCBench. AIGCBench includes three modules: the evaluation dataset, the evaluation metrics, and the video generation models to be assessed. The benchmark encompasses two types of datasets: video-text and image-text datasets. To construct a more comprehensive evaluation dataset, the image-text dataset is expanded using the generation pipeline. Additionally, for a thorough evaluation of video generation models, a set of evaluation metrics comprising 11 metrics across four dimensions is introduced. These metrics include both reference video-based and reference video-free metrics, making full use of the benchmark proposed. Human validation is used to confirm the rationality of the evaluation standards proposed. AIGCBench enables the comparison of different algorithms under equivalent evaluation conditions, allowing for an analysis of the strengths and weaknesses of different state-of-the-art video generation algorithms. The first version of AIGCBench addresses the current lack of a reasonable benchmark for I2V tasks by providing a thorough evaluation for them. Subsequent versions plan to include more video generation tasks and place them under equivalent evaluation conditions for a fair comparison. AIGCBench is engineered to meet the diverse demands of users looking to animate a broad array of static images. It addresses the challenge of animating images such as a blue dragon skateboarding in Times Square by deploying a text combiner to generate a rich assortment of text prompts that span a multitude of subjects, behaviors, backgrounds, and artistic styles. GPT-4 is used to enhance the text prompts, rendering them more vivid and intricate. These detailed prompts guide the generation of images through state-of-the-art Text-to-Image diffusion models. By blending video-text and image-text datasets, AIGCBench ensures a robust and comprehensive evaluation of a range of I2V algorithms. The benchmark evaluates four critical dimensions: control-video alignment, motion effects, temporal consistency, and video quality. This integrated framework combines metrics that
Reach us at info@study.space