[slides] Unsupervised Learning for Physical Interaction through Video Prediction

The paper "Unsupervised Learning for Physical Interaction through Video Prediction" by Chelsea Finn addresses the challenge of predicting how actions affect objects in the environment without requiring labeled data. The authors develop an action-conditioned video prediction model that explicitly models pixel motion by predicting a distribution over pixel motion from previous frames. This approach allows the model to be partially invariant to object appearance, enabling it to generalize to unseen objects. To evaluate the model, they introduce a dataset of 50,000 robot interactions involving pushing motions, including a test set with novel objects. The experiments show that the proposed method produces more accurate video predictions and better predicts object motion compared to prior methods. The paper also discusses related work on video prediction, learning physics, and video datasets, and provides detailed evaluations using the robotic pushing dataset and the Human3.6M dataset for human motion prediction. The results demonstrate the effectiveness of the proposed method in predicting future object motion and planning actions for interactive tasks.The paper "Unsupervised Learning for Physical Interaction through Video Prediction" by Chelsea Finn addresses the challenge of predicting how actions affect objects in the environment without requiring labeled data. The authors develop an action-conditioned video prediction model that explicitly models pixel motion by predicting a distribution over pixel motion from previous frames. This approach allows the model to be partially invariant to object appearance, enabling it to generalize to unseen objects. To evaluate the model, they introduce a dataset of 50,000 robot interactions involving pushing motions, including a test set with novel objects. The experiments show that the proposed method produces more accurate video predictions and better predicts object motion compared to prior methods. The paper also discusses related work on video prediction, learning physics, and video datasets, and provides detailed evaluations using the robotic pushing dataset and the Human3.6M dataset for human motion prediction. The results demonstrate the effectiveness of the proposed method in predicting future object motion and planning actions for interactive tasks.

Unsupervised Learning for Physical Interaction through Video Prediction

9 Jun 2016 | Chelsea Finn, Ian Goodfellow, Sergey Levine