Video-Infinty: Distributed Long Video Generation

Video-Infinty: Distributed Long Video Generation

24 Jun 2024 | Zhenxiong Tan, Xingyi Yang, Songhua Liu, Xinchao Wang
Video-Infinity is a distributed inference pipeline for long video generation, enabling parallel processing across multiple GPUs. The method addresses the challenges of generating long videos, including high memory requirements and extended processing times on a single GPU. It introduces two key mechanisms: Clip parallelism and Dual-scope attention. Clip parallelism optimizes the sharing of context information across GPUs, minimizing communication overhead, while Dual-scope attention modulates temporal self-attention to balance local and global contexts efficiently. Together, these mechanisms allow for the fast generation of long videos. On an 8×Nvidia 6000 Ada GPU setup, the method generates videos up to 2,300 frames in approximately 5 minutes, achieving a speed 100 times faster than prior methods. The system reduces memory overhead from quadratic to linear scale, enabling the generation of videos of any length. Video-Infinity is the first to address long video generation using distributed parallel computation, enhancing scalability and reducing generation times. It introduces two interconnected mechanisms: Clip parallelism, which optimizes context information sharing across GPUs, and Dual-scope attention, which adjusts temporal self-attention to ensure video coherence across devices. Experiments show that Video-Infinity outperforms existing methods in terms of video length and generation speed, generating videos up to 2300 frames, which is 8.2 times longer than OpenSora V1.1 and over 100 times faster than Streaming T2V. The method also achieves better video quality and consistency compared to other approaches.Video-Infinity is a distributed inference pipeline for long video generation, enabling parallel processing across multiple GPUs. The method addresses the challenges of generating long videos, including high memory requirements and extended processing times on a single GPU. It introduces two key mechanisms: Clip parallelism and Dual-scope attention. Clip parallelism optimizes the sharing of context information across GPUs, minimizing communication overhead, while Dual-scope attention modulates temporal self-attention to balance local and global contexts efficiently. Together, these mechanisms allow for the fast generation of long videos. On an 8×Nvidia 6000 Ada GPU setup, the method generates videos up to 2,300 frames in approximately 5 minutes, achieving a speed 100 times faster than prior methods. The system reduces memory overhead from quadratic to linear scale, enabling the generation of videos of any length. Video-Infinity is the first to address long video generation using distributed parallel computation, enhancing scalability and reducing generation times. It introduces two interconnected mechanisms: Clip parallelism, which optimizes context information sharing across GPUs, and Dual-scope attention, which adjusts temporal self-attention to ensure video coherence across devices. Experiments show that Video-Infinity outperforms existing methods in terms of video length and generation speed, generating videos up to 2300 frames, which is 8.2 times longer than OpenSora V1.1 and over 100 times faster than Streaming T2V. The method also achieves better video quality and consistency compared to other approaches.
Reach us at info@study.space