26 Oct 2016 | Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
This paper presents a generative adversarial network (GAN) for video generation that learns scene dynamics from unlabeled video data. The model uses a spatio-temporal convolutional architecture to separate foreground and background, enabling it to generate realistic videos with plausible motion. The model is evaluated on both video generation and action recognition tasks. Experiments show that the model can generate videos with realistic dynamics and motions, outperforming simple baselines. The model also learns useful features for action recognition with minimal supervision, suggesting that scene dynamics are a promising signal for representation learning. The paper also explores the use of the model for future prediction from static images, showing that it can generate plausible future frames. The model is trained on a large dataset of unlabeled videos, and the results demonstrate its effectiveness in generating realistic videos and learning visual representations. The paper concludes that generative video models have the potential to impact many applications in video understanding and simulation.This paper presents a generative adversarial network (GAN) for video generation that learns scene dynamics from unlabeled video data. The model uses a spatio-temporal convolutional architecture to separate foreground and background, enabling it to generate realistic videos with plausible motion. The model is evaluated on both video generation and action recognition tasks. Experiments show that the model can generate videos with realistic dynamics and motions, outperforming simple baselines. The model also learns useful features for action recognition with minimal supervision, suggesting that scene dynamics are a promising signal for representation learning. The paper also explores the use of the model for future prediction from static images, showing that it can generate plausible future frames. The model is trained on a large dataset of unlabeled videos, and the results demonstrate its effectiveness in generating realistic videos and learning visual representations. The paper concludes that generative video models have the potential to impact many applications in video understanding and simulation.