[slides] Character-Aware Neural Language Models

The paper introduces a neural language model that relies solely on character-level inputs and makes predictions at the word level. The model combines a convolutional neural network (CNN) and a highway network over characters, whose output is then fed into a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). Despite having 60% fewer parameters, the model achieves state-of-the-art performance on the English Penn Treebank. On morphologically rich languages such as Arabic, Czech, French, German, Spanish, and Russian, the model outperforms word-level/morpheme-level LSTM baselines. The authors analyze the word representations learned by the model, showing that it encodes both semantic and orthographic information from characters alone. The paper also discusses the effectiveness of highway layers and the impact of corpus and vocabulary sizes on model performance.The paper introduces a neural language model that relies solely on character-level inputs and makes predictions at the word level. The model combines a convolutional neural network (CNN) and a highway network over characters, whose output is then fed into a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). Despite having 60% fewer parameters, the model achieves state-of-the-art performance on the English Penn Treebank. On morphologically rich languages such as Arabic, Czech, French, German, Spanish, and Russian, the model outperforms word-level/morpheme-level LSTM baselines. The authors analyze the word representations learned by the model, showing that it encodes both semantic and orthographic information from characters alone. The paper also discusses the effectiveness of highway layers and the impact of corpus and vocabulary sizes on model performance.

Character-Aware Neural Language Models

2016 | Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush