19 Feb 2015 | Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals
This paper presents a regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, a widely used regularization method for feedforward neural networks, does not work well with RNNs and LSTMs. The authors show how to correctly apply dropout to LSTMs, which significantly reduces overfitting on various tasks such as language modeling, speech recognition, image caption generation, and machine translation.
The key idea is to apply dropout only to the non-recurrent connections of the LSTM, rather than to the recurrent ones. This approach allows the LSTM to maintain its ability to memorize information over long time intervals while benefiting from dropout regularization. The authors demonstrate that this method works well on a variety of tasks, including language modeling, speech recognition, machine translation, and image caption generation.
In the language modeling task, the authors tested their method on the Penn Tree Bank dataset and found that their regularized LSTMs outperformed non-regularized ones. In speech recognition, they showed that dropout improved frame accuracy on an Icelandic speech dataset. In machine translation, they demonstrated that dropout improved translation performance on the WMT'14 English to French dataset. In image caption generation, they applied their dropout method to the image caption generation model of Vinyals et al. (2014) and found that dropout helped improve performance.
The authors conclude that their method provides a simple and effective way to apply dropout to LSTMs, which results in significant performance improvements on various tasks. Their results suggest that dropout can be a useful regularization technique for RNNs, and their implementation could improve performance on a wide variety of applications.This paper presents a regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, a widely used regularization method for feedforward neural networks, does not work well with RNNs and LSTMs. The authors show how to correctly apply dropout to LSTMs, which significantly reduces overfitting on various tasks such as language modeling, speech recognition, image caption generation, and machine translation.
The key idea is to apply dropout only to the non-recurrent connections of the LSTM, rather than to the recurrent ones. This approach allows the LSTM to maintain its ability to memorize information over long time intervals while benefiting from dropout regularization. The authors demonstrate that this method works well on a variety of tasks, including language modeling, speech recognition, machine translation, and image caption generation.
In the language modeling task, the authors tested their method on the Penn Tree Bank dataset and found that their regularized LSTMs outperformed non-regularized ones. In speech recognition, they showed that dropout improved frame accuracy on an Icelandic speech dataset. In machine translation, they demonstrated that dropout improved translation performance on the WMT'14 English to French dataset. In image caption generation, they applied their dropout method to the image caption generation model of Vinyals et al. (2014) and found that dropout helped improve performance.
The authors conclude that their method provides a simple and effective way to apply dropout to LSTMs, which results in significant performance improvements on various tasks. Their results suggest that dropout can be a useful regularization technique for RNNs, and their implementation could improve performance on a wide variety of applications.