14 Dec 2017 | Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz
MoCoGAN is a generative adversarial network (GAN) framework for video generation that decomposes video into content and motion components. The framework generates videos by mapping a sequence of random vectors to a sequence of video frames. Each random vector consists of a content part and a motion part. The content part is kept fixed, while the motion part is modeled as a stochastic process. MoCoGAN uses both image and video discriminators to learn motion and content decomposition in an unsupervised manner. The framework allows generating videos with the same content but different motion, or different content but the same motion. Experimental results on several challenging datasets show that MoCoGAN outperforms state-of-the-art approaches in terms of video generation quality and consistency. MoCoGAN is evaluated on benchmark datasets, including shape motion, facial expression, Tai-Chi, and UCF101. The framework is also tested for image-to-video translation and categorical video generation. The results show that MoCoGAN generates more realistic videos and has better control over motion and content. The framework is also evaluated in a user study, where participants preferred MoCoGAN-generated videos over those generated by other methods. MoCoGAN's ability to decompose motion and content enables more controlled video generation, allowing for the synthesis of videos with different content and motion. The framework is implemented using a recurrent neural network for motion generation and a Gaussian distribution for content modeling. The framework is trained using an adversarial learning scheme with both image and video discriminators. The results show that MoCoGAN is effective in generating high-quality videos with realistic motion and content.MoCoGAN is a generative adversarial network (GAN) framework for video generation that decomposes video into content and motion components. The framework generates videos by mapping a sequence of random vectors to a sequence of video frames. Each random vector consists of a content part and a motion part. The content part is kept fixed, while the motion part is modeled as a stochastic process. MoCoGAN uses both image and video discriminators to learn motion and content decomposition in an unsupervised manner. The framework allows generating videos with the same content but different motion, or different content but the same motion. Experimental results on several challenging datasets show that MoCoGAN outperforms state-of-the-art approaches in terms of video generation quality and consistency. MoCoGAN is evaluated on benchmark datasets, including shape motion, facial expression, Tai-Chi, and UCF101. The framework is also tested for image-to-video translation and categorical video generation. The results show that MoCoGAN generates more realistic videos and has better control over motion and content. The framework is also evaluated in a user study, where participants preferred MoCoGAN-generated videos over those generated by other methods. MoCoGAN's ability to decompose motion and content enables more controlled video generation, allowing for the synthesis of videos with different content and motion. The framework is implemented using a recurrent neural network for motion generation and a Gaussian distribution for content modeling. The framework is trained using an adversarial learning scheme with both image and video discriminators. The results show that MoCoGAN is effective in generating high-quality videos with realistic motion and content.