6 May 2017 | Julieta Martinez*, Michael J. Black, and Javier Romero
The paper "On Human Motion Prediction Using Recurrent Neural Networks" by Julieta Martinez, Michael J. Black, and Javier Romero explores the use of deep recurrent neural networks (RNNs) for modeling human motion, particularly focusing on short-term motion prediction and long-term motion synthesis. The authors critique recent work in this area, noting that while deep RNNs have shown promise, they often suffer from discontinuities in short-term predictions and unrealistic artifacts in long-term predictions. They propose a simple yet effective baseline model that outperforms state-of-the-art methods, which they attribute to the baseline's ability to handle discontinuities more robustly.
The paper identifies several issues with current RNN approaches, including the need for hyper-parameter tuning, the complexity of deep models, and the lack of large-scale training datasets. To address these issues, the authors propose a sequence-to-sequence architecture with residual connections and a sampling-based loss function. This architecture is trained on a single GRU unit, avoiding the need for spatial encoding layers, and can be trained efficiently on large datasets like Human 3.6M.
Experiments demonstrate that the proposed method achieves state-of-the-art performance in short-term motion prediction while maintaining plausible long-term motion. The authors also find that training on multiple actions improves performance, highlighting the importance of diverse training data. The paper concludes by discussing the challenges of short-term and long-term motion prediction using RNNs and suggests future directions for research, including the use of larger, unsupervised datasets.The paper "On Human Motion Prediction Using Recurrent Neural Networks" by Julieta Martinez, Michael J. Black, and Javier Romero explores the use of deep recurrent neural networks (RNNs) for modeling human motion, particularly focusing on short-term motion prediction and long-term motion synthesis. The authors critique recent work in this area, noting that while deep RNNs have shown promise, they often suffer from discontinuities in short-term predictions and unrealistic artifacts in long-term predictions. They propose a simple yet effective baseline model that outperforms state-of-the-art methods, which they attribute to the baseline's ability to handle discontinuities more robustly.
The paper identifies several issues with current RNN approaches, including the need for hyper-parameter tuning, the complexity of deep models, and the lack of large-scale training datasets. To address these issues, the authors propose a sequence-to-sequence architecture with residual connections and a sampling-based loss function. This architecture is trained on a single GRU unit, avoiding the need for spatial encoding layers, and can be trained efficiently on large datasets like Human 3.6M.
Experiments demonstrate that the proposed method achieves state-of-the-art performance in short-term motion prediction while maintaining plausible long-term motion. The authors also find that training on multiple actions improves performance, highlighting the importance of diverse training data. The paper concludes by discussing the challenges of short-term and long-term motion prediction using RNNs and suggests future directions for research, including the use of larger, unsupervised datasets.