19 Feb 2015 | Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals
This paper presents a regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units, addressing the issue that Dropout, a successful regularization method for feedforward neural networks, does not work well with RNNs. The authors propose a method to apply Dropout to LSTM connections, specifically to non-recurrent connections, which helps reduce overfitting without compromising the LSTM's ability to memorize long-term information. The technique is evaluated on various tasks, including language modeling, speech recognition, image caption generation, and machine translation, showing significant improvements in performance. The paper also discusses the architecture and dynamics of LSTMs and provides detailed experimental results to support the effectiveness of the proposed regularization method.This paper presents a regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units, addressing the issue that Dropout, a successful regularization method for feedforward neural networks, does not work well with RNNs. The authors propose a method to apply Dropout to LSTM connections, specifically to non-recurrent connections, which helps reduce overfitting without compromising the LSTM's ability to memorize long-term information. The technique is evaluated on various tasks, including language modeling, speech recognition, image caption generation, and machine translation, showing significant improvements in performance. The paper also discusses the architecture and dynamics of LSTMs and provides detailed experimental results to support the effectiveness of the proposed regularization method.