[slides and audio] Distributed Representations of Words and Phrases and their Compositionality

This paper introduces and extends the Skip-gram model, a method for learning high-quality distributed vector representations of words and phrases. The authors present several improvements to the original model, including subsampling frequent words to speed up training and improve representation quality, and a simplified variant of Noise Contrastive Estimation (NCE) called Negative Sampling. They demonstrate that these techniques significantly enhance the accuracy and efficiency of the model. The paper also explores the ability of the learned representations to capture idiomatic phrases and perform linear analogical reasoning, showing that simple vector addition can produce meaningful results. The authors evaluate the models using various datasets and tasks, including word analogy and phrase analogy, and compare them to other published word representations. The results show that the Skip-gram model, especially with Negative Sampling and subsampling, outperforms previous models in terms of both speed and accuracy. The paper concludes by discussing the implications of these findings and the potential for further research in this area.This paper introduces and extends the Skip-gram model, a method for learning high-quality distributed vector representations of words and phrases. The authors present several improvements to the original model, including subsampling frequent words to speed up training and improve representation quality, and a simplified variant of Noise Contrastive Estimation (NCE) called Negative Sampling. They demonstrate that these techniques significantly enhance the accuracy and efficiency of the model. The paper also explores the ability of the learned representations to capture idiomatic phrases and perform linear analogical reasoning, showing that simple vector addition can produce meaningful results. The authors evaluate the models using various datasets and tasks, including word analogy and phrase analogy, and compare them to other published word representations. The results show that the Skip-gram model, especially with Negative Sampling and subsampling, outperforms previous models in terms of both speed and accuracy. The paper concludes by discussing the implications of these findings and the potential for further research in this area.

Distributed Representations of Words and Phrases and their Compositionality

16 Oct 2013 | Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean