16 Oct 2013 | Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean
This paper presents improvements to the Skip-gram model for learning high-quality distributed vector representations of words and phrases. The authors introduce several extensions that enhance both the quality of the vectors and the training speed. Subsampling of frequent words significantly speeds up training and improves the accuracy of less frequent words. They also present a simplified variant of Noise Contrastive Estimation (NCE) for training the Skip-gram model, which results in faster training and better vector representations for frequent words compared to hierarchical softmax.
Word representations are limited in their ability to represent idiomatic phrases. To address this, the authors present a method for finding phrases in text and show that learning good vector representations for millions of phrases is possible. They also demonstrate that simple vector addition can often produce meaningful results, such as combining "Russia" and "river" to get "Volga River".
The Skip-gram model is extended to phrase-based representations by treating phrases as individual tokens during training. This allows the model to learn more expressive representations of phrases. The authors also show that the Skip-gram model can be used to learn vector representations of phrases, which can be combined using simple arithmetic operations to produce meaningful results.
The paper evaluates the performance of different Skip-gram models on an analogical reasoning task. The results show that Negative Sampling outperforms the Hierarchical Softmax on this task. The authors also show that the Skip-gram model can be trained on large datasets, achieving high accuracy on the phrase analogy task.
The paper concludes that the Skip-gram model is a powerful tool for learning distributed representations of words and phrases. The model's linear structure allows for precise analogical reasoning and can be used to learn accurate representations of phrases. The authors also show that the Skip-gram model can be used to train models on large datasets, achieving high accuracy on the phrase analogy task. The code for training the word and phrase vectors based on the techniques described in this paper is available as an open-source project.This paper presents improvements to the Skip-gram model for learning high-quality distributed vector representations of words and phrases. The authors introduce several extensions that enhance both the quality of the vectors and the training speed. Subsampling of frequent words significantly speeds up training and improves the accuracy of less frequent words. They also present a simplified variant of Noise Contrastive Estimation (NCE) for training the Skip-gram model, which results in faster training and better vector representations for frequent words compared to hierarchical softmax.
Word representations are limited in their ability to represent idiomatic phrases. To address this, the authors present a method for finding phrases in text and show that learning good vector representations for millions of phrases is possible. They also demonstrate that simple vector addition can often produce meaningful results, such as combining "Russia" and "river" to get "Volga River".
The Skip-gram model is extended to phrase-based representations by treating phrases as individual tokens during training. This allows the model to learn more expressive representations of phrases. The authors also show that the Skip-gram model can be used to learn vector representations of phrases, which can be combined using simple arithmetic operations to produce meaningful results.
The paper evaluates the performance of different Skip-gram models on an analogical reasoning task. The results show that Negative Sampling outperforms the Hierarchical Softmax on this task. The authors also show that the Skip-gram model can be trained on large datasets, achieving high accuracy on the phrase analogy task.
The paper concludes that the Skip-gram model is a powerful tool for learning distributed representations of words and phrases. The model's linear structure allows for precise analogical reasoning and can be used to learn accurate representations of phrases. The authors also show that the Skip-gram model can be used to train models on large datasets, achieving high accuracy on the phrase analogy task. The code for training the word and phrase vectors based on the techniques described in this paper is available as an open-source project.