[slides] MagicTime%3A Time-lapse Video Generation Models as Metamorphic Simulators

**MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators** This paper introduces MagicTime, a novel framework for generating metamorphic time-lapse videos, which capture the physical knowledge and variations in real-world phenomena. MagicTime addresses the limitations of existing Text-to-Video (T2V) models by incorporating physical knowledge and enhancing the generation of metamorphic videos. The key contributions include: 1. **MagicAdapter**: A scheme to decouple spatial and temporal training, allowing the model to encode more physical knowledge from metamorphic videos and transform pre-trained T2V models to generate metamorphic videos. 2. **Dynamic Frames Extraction**: A strategy to adapt to metamorphic time-lapse videos, which have a wider variation range and cover dramatic object metamorphosis, ensuring the model can handle complex and varied content. 3. **Magic Text-Encoder**: An improved text encoder to enhance the model's understanding of metamorphic video prompts, distinguishing between metamorphic and general prompts. The authors also created the ChronoMagic dataset, a curated collection of 2,265 high-quality time-lapse videos with detailed captions, specifically designed to unlock the capabilities of metamorphic video generation. Extensive experiments demonstrate that MagicTime effectively generates high-quality and dynamic metamorphic videos, outperforming existing methods in terms of visual quality, frame consistency, metamorphic amplitude, and text alignment. The paper concludes by discussing the potential of MagicTime in building metamorphic simulators of the physical world and its ethical considerations.**MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators** This paper introduces MagicTime, a novel framework for generating metamorphic time-lapse videos, which capture the physical knowledge and variations in real-world phenomena. MagicTime addresses the limitations of existing Text-to-Video (T2V) models by incorporating physical knowledge and enhancing the generation of metamorphic videos. The key contributions include: 1. **MagicAdapter**: A scheme to decouple spatial and temporal training, allowing the model to encode more physical knowledge from metamorphic videos and transform pre-trained T2V models to generate metamorphic videos. 2. **Dynamic Frames Extraction**: A strategy to adapt to metamorphic time-lapse videos, which have a wider variation range and cover dramatic object metamorphosis, ensuring the model can handle complex and varied content. 3. **Magic Text-Encoder**: An improved text encoder to enhance the model's understanding of metamorphic video prompts, distinguishing between metamorphic and general prompts. The authors also created the ChronoMagic dataset, a curated collection of 2,265 high-quality time-lapse videos with detailed captions, specifically designed to unlock the capabilities of metamorphic video generation. Extensive experiments demonstrate that MagicTime effectively generates high-quality and dynamic metamorphic videos, outperforming existing methods in terms of visual quality, frame consistency, metamorphic amplitude, and text alignment. The paper concludes by discussing the potential of MagicTime in building metamorphic simulators of the physical world and its ethical considerations.

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

7 Apr 2024 | Shanghai Yuan1*, Jinfa Huang2*, Yujun Shi3, Yongqi Xu4, Ruijie Zhu5, Bin Lin1, Xinhua Cheng1, Li Yuan1†, Jiebo Luo2

7 Apr 2024 | Shanghai Yuan1, Jinfa Huang2, Yujun Shi3, Yongqi Xu4, Ruijie Zhu5, Bin Lin1, Xinhua Cheng1, Li Yuan1†, Jiebo Luo2