[slides] Exploring the Limits of Language Modeling

This paper explores recent advancements in Recurrent Neural Networks (RNNs) for large-scale language modeling, a task crucial for understanding natural language. The authors address two key challenges: large corpora and vocabulary sizes, and the complex, long-term structure of language. They extend current models to improve performance on the One Billion Word Benchmark, a dataset significantly larger than the Penn Tree Bank (PTB). Their best single model reduces perplexity from 51.3 to 30.0 while reducing parameters by a factor of 20, and an ensemble of models achieves a new record of 23.7 perplexity. The paper also introduces a Softmax loss based on character-level Convolutional Neural Networks (CNNs) and shares the models and training recipes to encourage further research in this area. The contributions include significant improvements in language modeling, the development of efficient training techniques, and the demonstration of the benefits of character-level inputs.This paper explores recent advancements in Recurrent Neural Networks (RNNs) for large-scale language modeling, a task crucial for understanding natural language. The authors address two key challenges: large corpora and vocabulary sizes, and the complex, long-term structure of language. They extend current models to improve performance on the One Billion Word Benchmark, a dataset significantly larger than the Penn Tree Bank (PTB). Their best single model reduces perplexity from 51.3 to 30.0 while reducing parameters by a factor of 20, and an ensemble of models achieves a new record of 23.7 perplexity. The paper also introduces a Softmax loss based on character-level Convolutional Neural Networks (CNNs) and shares the models and training recipes to encourage further research in this area. The contributions include significant improvements in language modeling, the development of efficient training techniques, and the demonstration of the benefits of character-level inputs.

Exploring the Limits of Language Modeling

11 Feb 2016 | Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu