MoonShot: Towards Controllable Video Generation and Editing with Multimodal Conditions

MoonShot: Towards Controllable Video Generation and Editing with Multimodal Conditions

3 Jan 2024 | David Junhao Zhang, Dongxu Li, Hung Le, Mike Zheng Shou, Caiming Xiong, Doyen Sahoo
MoonShot is a novel video generation model that conditions on both image and text inputs, leveraging a multimodal video block (MVB) to enhance visual appearance and geometric structure control. The MVB consists of spatial-temporal layers for video feature representation and a decoupled cross-attention layer for conditioning on both text and image inputs. This design allows for the integration of pre-trained image ControlNet modules without additional training, enabling precise control over geometric structures. MoonShot demonstrates superior performance in various applications, including personalized video generation, image animation, and video editing, outperforming existing methods in terms of visual quality and temporal consistency. The model's versatile architecture and conditioning mechanisms make it a promising foundation for controllable video generation.MoonShot is a novel video generation model that conditions on both image and text inputs, leveraging a multimodal video block (MVB) to enhance visual appearance and geometric structure control. The MVB consists of spatial-temporal layers for video feature representation and a decoupled cross-attention layer for conditioning on both text and image inputs. This design allows for the integration of pre-trained image ControlNet modules without additional training, enabling precise control over geometric structures. MoonShot demonstrates superior performance in various applications, including personalized video generation, image animation, and video editing, outperforming existing methods in terms of visual quality and temporal consistency. The model's versatile architecture and conditioning mechanisms make it a promising foundation for controllable video generation.
Reach us at info@study.space