Character-Aware Neural Language Models

Character-Aware Neural Language Models

2016 | Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush
This paper presents a character-aware neural language model that uses only character-level inputs to predict words. The model combines a convolutional neural network (CNN) and a highway network over characters, whose output is fed into a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). The model achieves performance comparable to existing state-of-the-art models on the English Penn Treebank, despite having 60% fewer parameters. On morphologically rich languages (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines with fewer parameters. The results suggest that character inputs are sufficient for language modeling in many languages. Analysis of word representations from the character composition part of the model shows that it can encode both semantic and orthographic information from characters alone. The model uses a character-level CNN to generate features, which are then passed through a highway network before being input into an LSTM. The model is trained using truncated backpropagation through time and achieves good performance on various languages, including English, Czech, German, French, Spanish, Russian, and Arabic. The model outperforms other models on these languages, even when the dataset contains out-of-vocabulary words replaced with <unk>. The model's performance is evaluated using perplexity, and it shows significant improvements over word-level models. The model's architecture is simple and efficient, making it suitable for applications where model size is a concern. The model's success suggests that character-level inputs can be effective for language modeling, and further research is needed to explore the potential of character-based models in other tasks.This paper presents a character-aware neural language model that uses only character-level inputs to predict words. The model combines a convolutional neural network (CNN) and a highway network over characters, whose output is fed into a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). The model achieves performance comparable to existing state-of-the-art models on the English Penn Treebank, despite having 60% fewer parameters. On morphologically rich languages (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines with fewer parameters. The results suggest that character inputs are sufficient for language modeling in many languages. Analysis of word representations from the character composition part of the model shows that it can encode both semantic and orthographic information from characters alone. The model uses a character-level CNN to generate features, which are then passed through a highway network before being input into an LSTM. The model is trained using truncated backpropagation through time and achieves good performance on various languages, including English, Czech, German, French, Spanish, Russian, and Arabic. The model outperforms other models on these languages, even when the dataset contains out-of-vocabulary words replaced with <unk>. The model's performance is evaluated using perplexity, and it shows significant improvements over word-level models. The model's architecture is simple and efficient, making it suitable for applications where model size is a concern. The model's success suggests that character-level inputs can be effective for language modeling, and further research is needed to explore the potential of character-based models in other tasks.
Reach us at info@study.space
[slides] Character-Aware Neural Language Models | StudySpace