Generating Text with Recurrent Neural Networks

Generating Text with Recurrent Neural Networks

2011 | Ilya Sutskever, James Martens, Geoffrey Hinton
This paper presents a study on the application of Recurrent Neural Networks (RNNs) with the Hessian-Free (HF) optimizer for character-level language modeling. The authors demonstrate that RNNs, despite their potential, have been difficult to train due to the "vanishing/exploding gradients" problem. Recent advances in HF optimization have made it possible to train RNNs effectively, allowing them to be applied to challenging sequence problems. The authors introduce a new RNN variant, the Multiplicative RNN (MRNN), which uses multiplicative connections to allow the current input character to determine the transition matrix from one hidden state vector to the next. After training the MRNN with the HF optimizer for five days on 8 high-end Graphics Processing Units, the authors were able to surpass the performance of the best previous single method for character-level language modeling – a hierarchical non-parametric sequence model. The MRNN outperformed the sequence memoizer and PAQ on three real-world language datasets. The MRNN was able to generate text with a large vocabulary, considerable grammatical structure, and a wide variety of highly plausible proper names not present in the training set. The MRNN also showed the ability to balance parentheses and quotes over long distances, a task that is difficult for other models. The authors conclude that the MRNN is a powerful model for character-level language modeling and that further research is needed to explore its potential.This paper presents a study on the application of Recurrent Neural Networks (RNNs) with the Hessian-Free (HF) optimizer for character-level language modeling. The authors demonstrate that RNNs, despite their potential, have been difficult to train due to the "vanishing/exploding gradients" problem. Recent advances in HF optimization have made it possible to train RNNs effectively, allowing them to be applied to challenging sequence problems. The authors introduce a new RNN variant, the Multiplicative RNN (MRNN), which uses multiplicative connections to allow the current input character to determine the transition matrix from one hidden state vector to the next. After training the MRNN with the HF optimizer for five days on 8 high-end Graphics Processing Units, the authors were able to surpass the performance of the best previous single method for character-level language modeling – a hierarchical non-parametric sequence model. The MRNN outperformed the sequence memoizer and PAQ on three real-world language datasets. The MRNN was able to generate text with a large vocabulary, considerable grammatical structure, and a wide variety of highly plausible proper names not present in the training set. The MRNN also showed the ability to balance parentheses and quotes over long distances, a task that is difficult for other models. The authors conclude that the MRNN is a powerful model for character-level language modeling and that further research is needed to explore its potential.
Reach us at info@study.space