26 Sep 2016 | Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher
This paper introduces the pointer sentinel mixture model for neural sequence models, which combines the strengths of standard softmax classifiers with pointer networks to improve language modeling performance. The model can either reproduce a word from the recent context or generate a word from a standard softmax classifier. The pointer sentinel-LSTM model achieves state-of-the-art language modeling performance on the Penn Treebank (70.9 perplexity) while using far fewer parameters than a standard softmax LSTM. To evaluate how well language models can exploit longer contexts and deal with more realistic vocabularies and larger corpora, the authors also introduce the freely available WikiText corpus.
The model uses a mixture of two components: the softmax-RNN component and the pointer network component. The softmax-RNN component uses a standard softmax classifier to predict the next word, while the pointer network component uses attention to select words from the input sequence. The model uses a gating function to decide when to use the pointer component or the softmax-RNN component. The gating function is influenced by both the RNN hidden state and the pointer window's hidden states.
The pointer sentinel mixture model is evaluated on the Penn Treebank and the WikiText corpus. The model achieves lower perplexity than other models on both datasets. The model is also able to handle rare words more effectively than standard RNN models. The model is able to reproduce rare words by using the pointer component, which allows the model to select words from the input sequence. The model is also able to handle long-term dependencies by using the pointer component to select words from the input sequence.
The model is also able to handle large vocabularies and longer contexts by using the pointer component to select words from the input sequence. The model is able to handle these challenges more effectively than standard RNN models. The model is able to achieve state-of-the-art results on the Penn Treebank and the WikiText corpus. The model is also able to handle rare words more effectively than standard RNN models. The model is able to reproduce rare words by using the pointer component, which allows the model to select words from the input sequence. The model is also able to handle long-term dependencies by using the pointer component to select words from the input sequence.This paper introduces the pointer sentinel mixture model for neural sequence models, which combines the strengths of standard softmax classifiers with pointer networks to improve language modeling performance. The model can either reproduce a word from the recent context or generate a word from a standard softmax classifier. The pointer sentinel-LSTM model achieves state-of-the-art language modeling performance on the Penn Treebank (70.9 perplexity) while using far fewer parameters than a standard softmax LSTM. To evaluate how well language models can exploit longer contexts and deal with more realistic vocabularies and larger corpora, the authors also introduce the freely available WikiText corpus.
The model uses a mixture of two components: the softmax-RNN component and the pointer network component. The softmax-RNN component uses a standard softmax classifier to predict the next word, while the pointer network component uses attention to select words from the input sequence. The model uses a gating function to decide when to use the pointer component or the softmax-RNN component. The gating function is influenced by both the RNN hidden state and the pointer window's hidden states.
The pointer sentinel mixture model is evaluated on the Penn Treebank and the WikiText corpus. The model achieves lower perplexity than other models on both datasets. The model is also able to handle rare words more effectively than standard RNN models. The model is able to reproduce rare words by using the pointer component, which allows the model to select words from the input sequence. The model is also able to handle long-term dependencies by using the pointer component to select words from the input sequence.
The model is also able to handle large vocabularies and longer contexts by using the pointer component to select words from the input sequence. The model is able to handle these challenges more effectively than standard RNN models. The model is able to achieve state-of-the-art results on the Penn Treebank and the WikiText corpus. The model is also able to handle rare words more effectively than standard RNN models. The model is able to reproduce rare words by using the pointer component, which allows the model to select words from the input sequence. The model is also able to handle long-term dependencies by using the pointer component to select words from the input sequence.