June 5th, 2015 | Zachary C. Lipton, John Berkowitz, Charles Elkan
This paper reviews and synthesizes research on recurrent neural networks (RNNs) for sequence learning over the past three decades. RNNs are connectionist models that capture sequence dynamics via cycles in the network. Unlike standard feedforward networks, RNNs retain a state that can represent information from an arbitrarily long context window. While RNNs have traditionally been difficult to train, recent advances in network architectures, optimization techniques, and parallel computation have enabled successful large-scale learning. Systems based on long short-term memory (LSTM) and bidirectional (BRNN) architectures have demonstrated groundbreaking performance on tasks such as image captioning, language translation, and handwriting recognition.
RNNs are particularly useful for modeling sequential data with temporal or sequential structure and varying input/output lengths. They can model input and/or output consisting of sequences of elements that are not independent, and can simultaneously model sequential and time dependencies on multiple scales. RNNs are capable of capturing long-range time dependencies, overcoming the chief limitation of Markov models. However, RNNs can suffer from overfitting, but they are amenable to gradient-based training and can be regularized via techniques such as weight decay, dropout, and limiting the degrees of freedom.
The paper compares RNNs to prior literature, noting that the literature on RNNs can seem impenetrable to the uninitiated. It provides a readable, intuitive, consistently notated, and reasonably comprehensive but selective survey of research on RNNs for learning with sequences. It emphasizes architectures, algorithms, and results, but also aims to distill the intuitions that have guided this largely heuristic and empirical field. The paper also offers qualitative arguments, a historical perspective, and comparisons to alternative methodologies where appropriate.
The paper introduces formal notation and provides a brief background on neural networks in general. It discusses sequences, neural networks, feedforward networks and backpropagation. It then discusses recurrent neural networks, their early designs, training, and modern architectures. The paper explains the LSTM and BRNN architectures, which have become the most successful RNN architectures for sequence learning. The LSTM model was introduced primarily to overcome the problem of vanishing gradients. The BRNN model introduces an architecture in which information from both the future and the past are used to determine the output at any point in the sequence. The paper concludes that RNNs, particularly LSTMs and BRNNs, have shown superior ability to learn long-range dependencies compared to simple RNNs.This paper reviews and synthesizes research on recurrent neural networks (RNNs) for sequence learning over the past three decades. RNNs are connectionist models that capture sequence dynamics via cycles in the network. Unlike standard feedforward networks, RNNs retain a state that can represent information from an arbitrarily long context window. While RNNs have traditionally been difficult to train, recent advances in network architectures, optimization techniques, and parallel computation have enabled successful large-scale learning. Systems based on long short-term memory (LSTM) and bidirectional (BRNN) architectures have demonstrated groundbreaking performance on tasks such as image captioning, language translation, and handwriting recognition.
RNNs are particularly useful for modeling sequential data with temporal or sequential structure and varying input/output lengths. They can model input and/or output consisting of sequences of elements that are not independent, and can simultaneously model sequential and time dependencies on multiple scales. RNNs are capable of capturing long-range time dependencies, overcoming the chief limitation of Markov models. However, RNNs can suffer from overfitting, but they are amenable to gradient-based training and can be regularized via techniques such as weight decay, dropout, and limiting the degrees of freedom.
The paper compares RNNs to prior literature, noting that the literature on RNNs can seem impenetrable to the uninitiated. It provides a readable, intuitive, consistently notated, and reasonably comprehensive but selective survey of research on RNNs for learning with sequences. It emphasizes architectures, algorithms, and results, but also aims to distill the intuitions that have guided this largely heuristic and empirical field. The paper also offers qualitative arguments, a historical perspective, and comparisons to alternative methodologies where appropriate.
The paper introduces formal notation and provides a brief background on neural networks in general. It discusses sequences, neural networks, feedforward networks and backpropagation. It then discusses recurrent neural networks, their early designs, training, and modern architectures. The paper explains the LSTM and BRNN architectures, which have become the most successful RNN architectures for sequence learning. The LSTM model was introduced primarily to overcome the problem of vanishing gradients. The BRNN model introduces an architecture in which information from both the future and the past are used to determine the output at any point in the sequence. The paper concludes that RNNs, particularly LSTMs and BRNNs, have shown superior ability to learn long-range dependencies compared to simple RNNs.