11 Apr 2016 | Ashesh Jain1-2, Amir R. Zamir2, Silvio Savarese2, and Ashutosh Saxena3
The paper "Structural-RNN: Deep Learning on Spatio-Temporal Graphs" by Ashesh Jain, Amir R. Zamir, Silvio Savarese, and Ashutosh Saxena proposes a novel approach to combining high-level spatio-temporal graphs with the sequence modeling capabilities of Recurrent Neural Networks (RNNs). The authors develop a scalable method to transform any spatio-temporal graph into a rich, feedforward, and fully differentiable RNN mixture, which can be jointly trained. This method is generic and principled, making it applicable to a wide range of problems. The paper evaluates the proposed approach on various spatio-temporal tasks, including human motion modeling, human-object interaction, and driver decision making, demonstrating significant improvements over state-of-the-art methods. The authors also provide insights into the complexity and convergence properties of the Structural-RNN (S-RNN) and visualize its memory cells to reveal interesting semantic operations. The contributions of the paper include a generic method for transforming st-graphs into RNN mixtures, demonstrating the superiority of structured approaches over unstructured RNNs, and showing that modeling structure with S-RNN outperforms non-deep learning-based structured methods.The paper "Structural-RNN: Deep Learning on Spatio-Temporal Graphs" by Ashesh Jain, Amir R. Zamir, Silvio Savarese, and Ashutosh Saxena proposes a novel approach to combining high-level spatio-temporal graphs with the sequence modeling capabilities of Recurrent Neural Networks (RNNs). The authors develop a scalable method to transform any spatio-temporal graph into a rich, feedforward, and fully differentiable RNN mixture, which can be jointly trained. This method is generic and principled, making it applicable to a wide range of problems. The paper evaluates the proposed approach on various spatio-temporal tasks, including human motion modeling, human-object interaction, and driver decision making, demonstrating significant improvements over state-of-the-art methods. The authors also provide insights into the complexity and convergence properties of the Structural-RNN (S-RNN) and visualize its memory cells to reveal interesting semantic operations. The contributions of the paper include a generic method for transforming st-graphs into RNN mixtures, demonstrating the superiority of structured approaches over unstructured RNNs, and showing that modeling structure with S-RNN outperforms non-deep learning-based structured methods.