[slides and audio] ChronoMagic-Bench%3A A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation

ChronoMagic-Bench is a novel benchmark for evaluating the temporal and metamorphic capabilities of text-to-video (T2V) models in generating time-lapse videos. Unlike existing benchmarks that focus on visual quality and textual relevance, ChronoMagic-Bench emphasizes the models' ability to generate time-lapse videos with significant metamorphic amplitude and temporal coherence. The benchmark includes 1,649 prompts and real-world videos categorized into four major types of time-lapse videos: biological, human-created, meteorological, and physical phenomena, further divided into 75 subcategories. To evaluate metamorphic attributes and temporal coherence, two new automatic metrics, MTScore and CHScore, are introduced. MTScore measures metamorphic amplitude, while CHScore assesses temporal coherence. Based on ChronoMagic-Bench, comprehensive manual evaluations of ten representative T2V models are conducted, revealing their strengths and weaknesses across different categories of prompts. Additionally, a large-scale dataset, ChronoMagic-Pro, is created containing 460k high-quality 720p time-lapse videos and detailed captions. The dataset provides a comprehensive evaluation framework for T2V research. The results highlight the weaknesses of existing T2V models, including their inability to generate time-lapse videos with large variations, poor adherence to prompts, and flickering despite high visual quality. The benchmark and dataset aim to address these limitations and promote advancements in T2V generation.ChronoMagic-Bench is a novel benchmark for evaluating the temporal and metamorphic capabilities of text-to-video (T2V) models in generating time-lapse videos. Unlike existing benchmarks that focus on visual quality and textual relevance, ChronoMagic-Bench emphasizes the models' ability to generate time-lapse videos with significant metamorphic amplitude and temporal coherence. The benchmark includes 1,649 prompts and real-world videos categorized into four major types of time-lapse videos: biological, human-created, meteorological, and physical phenomena, further divided into 75 subcategories. To evaluate metamorphic attributes and temporal coherence, two new automatic metrics, MTScore and CHScore, are introduced. MTScore measures metamorphic amplitude, while CHScore assesses temporal coherence. Based on ChronoMagic-Bench, comprehensive manual evaluations of ten representative T2V models are conducted, revealing their strengths and weaknesses across different categories of prompts. Additionally, a large-scale dataset, ChronoMagic-Pro, is created containing 460k high-quality 720p time-lapse videos and detailed captions. The dataset provides a comprehensive evaluation framework for T2V research. The results highlight the weaknesses of existing T2V models, including their inability to generate time-lapse videos with large variations, poor adherence to prompts, and flickering despite high visual quality. The benchmark and dataset aim to address these limitations and promote advancements in T2V generation.

ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation

26 Jun 2024 | Shenghai Yuan, Jinfa Huang, Yongqi Xu, Yaoyang Liu, Shaofeng Zhang, Yujun Shi, Ruijie Zhu, Xinhua Cheng, Jiebo Luo, Li Yuan