11 Feb 2016 | Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu
This paper presents advances in large-scale language modeling using recurrent neural networks (RNNs). The authors explore techniques such as character convolutional neural networks (CNNs) and long-short term memory (LSTM) to improve language models on the One Billion Word Benchmark. Their best single model significantly improves perplexity from 51.3 to 30.0 while reducing the number of parameters by a factor of 20. An ensemble of models sets a new record by improving perplexity from 41.0 down to 23.7. The models are released for the NLP and ML community to study and improve upon.
Language modeling is central to NLP and language understanding. Models that accurately predict sentence distributions encode language complexities and distill knowledge from corpora. Recent advances in deep learning and RNNs have enabled better language models, which improve downstream tasks like speech recognition and machine translation. Large-scale language models can compactly extract knowledge from training data, as demonstrated by models trained on movie subtitles that can generate answers to questions about objects and people.
The authors focus on the One Billion Word Benchmark, a large-scale language modeling task. They explore techniques such as character-level CNNs and LSTM to improve language models. They design a Softmax loss based on character-level CNNs that is efficient to train and as precise as a full Softmax. Their study yields significant improvements in perplexity for single models and ensembles. They also show that an ensemble of models can significantly improve perplexity.
The authors also explore the use of character-level embeddings and CNNs for word embeddings. They find that character-level embeddings allow for a smoother and more compact parametrization of word embeddings. They also show that using a CNN Softmax layer can reduce the number of parameters and improve performance.
The authors also explore the use of character-level LSTMs for predicting words. They find that combining word and character-level models can improve performance. They also show that using a CNN Softmax layer can reduce the number of parameters and improve performance.
The authors conducted experiments on the One Billion Word Benchmark and found that their models significantly improved perplexity. They also found that using importance sampling and noise contrastive estimation can improve performance. They also found that using a CNN Softmax layer can reduce the number of parameters and improve performance.
The authors conclude that RNN-based language models can be trained on large amounts of data and outperform competing models. They also find that using a CNN Softmax layer can reduce the number of parameters and improve performance. They also find that using an ensemble of models can significantly improve performance. The authors hope that their work will inspire further research in large-scale language modeling.This paper presents advances in large-scale language modeling using recurrent neural networks (RNNs). The authors explore techniques such as character convolutional neural networks (CNNs) and long-short term memory (LSTM) to improve language models on the One Billion Word Benchmark. Their best single model significantly improves perplexity from 51.3 to 30.0 while reducing the number of parameters by a factor of 20. An ensemble of models sets a new record by improving perplexity from 41.0 down to 23.7. The models are released for the NLP and ML community to study and improve upon.
Language modeling is central to NLP and language understanding. Models that accurately predict sentence distributions encode language complexities and distill knowledge from corpora. Recent advances in deep learning and RNNs have enabled better language models, which improve downstream tasks like speech recognition and machine translation. Large-scale language models can compactly extract knowledge from training data, as demonstrated by models trained on movie subtitles that can generate answers to questions about objects and people.
The authors focus on the One Billion Word Benchmark, a large-scale language modeling task. They explore techniques such as character-level CNNs and LSTM to improve language models. They design a Softmax loss based on character-level CNNs that is efficient to train and as precise as a full Softmax. Their study yields significant improvements in perplexity for single models and ensembles. They also show that an ensemble of models can significantly improve perplexity.
The authors also explore the use of character-level embeddings and CNNs for word embeddings. They find that character-level embeddings allow for a smoother and more compact parametrization of word embeddings. They also show that using a CNN Softmax layer can reduce the number of parameters and improve performance.
The authors also explore the use of character-level LSTMs for predicting words. They find that combining word and character-level models can improve performance. They also show that using a CNN Softmax layer can reduce the number of parameters and improve performance.
The authors conducted experiments on the One Billion Word Benchmark and found that their models significantly improved perplexity. They also found that using importance sampling and noise contrastive estimation can improve performance. They also found that using a CNN Softmax layer can reduce the number of parameters and improve performance.
The authors conclude that RNN-based language models can be trained on large amounts of data and outperform competing models. They also find that using a CNN Softmax layer can reduce the number of parameters and improve performance. They also find that using an ensemble of models can significantly improve performance. The authors hope that their work will inspire further research in large-scale language modeling.