22 Mar 2024 | Zhengqing Yuan, Ruoxi Chen, Zhaoxu Li, Haolong Jia, Lifang He, Chi Wang, Lichao Sun
Mora is a multi-agent framework designed to enable generalist video generation, aiming to replicate the capabilities of Sora, the first large-scale video generation model. The framework leverages multiple advanced visual AI agents to perform tasks such as text-to-video generation, text-conditional image-to-video generation, extending generated videos, video-to-video editing, connecting videos, and simulating digital worlds. Mora achieves performance comparable to Sora in various tasks but has a notable performance gap when assessed holistically. The framework is open-source, allowing for broader collaboration and innovation in video generation. Mora's approach involves decomposing video generation into subtasks, each handled by a dedicated agent, enabling flexible and efficient video generation. The framework's agents work collaboratively to generate high-resolution, temporally consistent videos from text prompts. Mora's performance is evaluated using various metrics, including video quality, consistency, and temporal coherence. While Mora shows strong results in many tasks, it still lags behind Sora in some aspects, particularly in video quality and length. The framework's open-source nature and collaborative approach offer significant potential for future advancements in video generation.Mora is a multi-agent framework designed to enable generalist video generation, aiming to replicate the capabilities of Sora, the first large-scale video generation model. The framework leverages multiple advanced visual AI agents to perform tasks such as text-to-video generation, text-conditional image-to-video generation, extending generated videos, video-to-video editing, connecting videos, and simulating digital worlds. Mora achieves performance comparable to Sora in various tasks but has a notable performance gap when assessed holistically. The framework is open-source, allowing for broader collaboration and innovation in video generation. Mora's approach involves decomposing video generation into subtasks, each handled by a dedicated agent, enabling flexible and efficient video generation. The framework's agents work collaboratively to generate high-resolution, temporally consistent videos from text prompts. Mora's performance is evaluated using various metrics, including video quality, consistency, and temporal coherence. While Mora shows strong results in many tasks, it still lags behind Sora in some aspects, particularly in video quality and length. The framework's open-source nature and collaborative approach offer significant potential for future advancements in video generation.