Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT

5 Jun 2024 | Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao
Lumina-Next improves Lumina-T2X by enhancing generation performance and efficiency. It introduces Next-DiT with 3D RoPE and sandwich normalization, enabling better resolution extrapolation and multilingual generation. The framework also incorporates Frequency- and Time-Aware Scaled RoPE for improved resolution and detail preservation. Optimized time schedules and higher-order solvers reduce sampling steps, while Time-Aware Context Drop merges redundant tokens for faster inference. Lumina-Next demonstrates superior performance in text-to-image generation, multi-view, audio, music, and point cloud generation. It supports zero-shot multilingual generation using decoder-based LLMs as text encoders. The framework is versatile, capable of handling various modalities and resolutions. All code and model weights are available for further research and development.Lumina-Next improves Lumina-T2X by enhancing generation performance and efficiency. It introduces Next-DiT with 3D RoPE and sandwich normalization, enabling better resolution extrapolation and multilingual generation. The framework also incorporates Frequency- and Time-Aware Scaled RoPE for improved resolution and detail preservation. Optimized time schedules and higher-order solvers reduce sampling steps, while Time-Aware Context Drop merges redundant tokens for faster inference. Lumina-Next demonstrates superior performance in text-to-image generation, multi-view, audio, music, and point cloud generation. It supports zero-shot multilingual generation using decoder-based LLMs as text encoders. The framework is versatile, capable of handling various modalities and resolutions. All code and model weights are available for further research and development.
Reach us at info@study.space