Generating Text with Recurrent Neural Networks

Generating Text with Recurrent Neural Networks

2011 | Ilya Sutskever, James Martens, Geoffrey Hinton
This paper explores the application of Recurrent Neural Networks (RNNs) to character-level language modeling, a challenging task due to the difficulty in training RNNs effectively. The authors introduce a new RNN variant called the Multiplicative RNN (MRNN), which uses multiplicative connections to allow the current input character to influence the transition matrix from one hidden state to the next. This variant is trained using a Hessian-Free (HF) optimization algorithm, which overcomes the "vanishing/exploding gradients problem" that has historically plagued RNNs. The MRNN is tested on three large datasets: English Wikipedia, New York Times articles, and machine learning papers. The results show that the MRNN outperforms the sequence memoizer, a hierarchical non-parametric Bayesian method, and approaches the performance of PAQ8hp12, a powerful compression model. The paper also demonstrates that the MRNN can generate text with rich and plausible linguistic structures, including balanced parentheses and proper names not seen during training. The authors discuss the potential of MRNNs for improving text compression and assistive technologies for people with physical disabilities.This paper explores the application of Recurrent Neural Networks (RNNs) to character-level language modeling, a challenging task due to the difficulty in training RNNs effectively. The authors introduce a new RNN variant called the Multiplicative RNN (MRNN), which uses multiplicative connections to allow the current input character to influence the transition matrix from one hidden state to the next. This variant is trained using a Hessian-Free (HF) optimization algorithm, which overcomes the "vanishing/exploding gradients problem" that has historically plagued RNNs. The MRNN is tested on three large datasets: English Wikipedia, New York Times articles, and machine learning papers. The results show that the MRNN outperforms the sequence memoizer, a hierarchical non-parametric Bayesian method, and approaches the performance of PAQ8hp12, a powerful compression model. The paper also demonstrates that the MRNN can generate text with rich and plausible linguistic structures, including balanced parentheses and proper names not seen during training. The authors discuss the potential of MRNNs for improving text compression and assistive technologies for people with physical disabilities.
Reach us at info@study.space
[slides] Generating Text with Recurrent Neural Networks | StudySpace