7 May 2024 | Fan Bao, Chendong Xiang, Gang Yue, Guande He, Hongzhou Zhu, Kaiwen Zheng, Min Zhao, Shilong Liu, Yaole Wang, Jun Zhu
Vidu is a high-performance text-to-video generator that can produce 1080p videos up to 16 seconds in length. It uses a diffusion model with U-ViT as its backbone, leveraging the scalability and long sequence modeling capabilities of transformers. Vidu exhibits strong coherence and dynamism, generating both realistic and imaginative videos, and understanding professional photography techniques. It can handle various video lengths, maintain 3D consistency, generate cuts, transitions, camera movements, lighting effects, and emotional portrayals. Vidu also demonstrates imaginative ability, generating scenes that do not exist in reality. Initial experiments on other controllable video generation tasks, such as canny-to-video generation, video prediction, and subject-driven generation, show promising results. Vidu's performance is comparable to Sora, the most powerful reported text-to-video generator, in terms of generation quality. The authors acknowledge support from various institutions and projects.Vidu is a high-performance text-to-video generator that can produce 1080p videos up to 16 seconds in length. It uses a diffusion model with U-ViT as its backbone, leveraging the scalability and long sequence modeling capabilities of transformers. Vidu exhibits strong coherence and dynamism, generating both realistic and imaginative videos, and understanding professional photography techniques. It can handle various video lengths, maintain 3D consistency, generate cuts, transitions, camera movements, lighting effects, and emotional portrayals. Vidu also demonstrates imaginative ability, generating scenes that do not exist in reality. Initial experiments on other controllable video generation tasks, such as canny-to-video generation, video prediction, and subject-driven generation, show promising results. Vidu's performance is comparable to Sora, the most powerful reported text-to-video generator, in terms of generation quality. The authors acknowledge support from various institutions and projects.