Improving Distributional Similarity with Lessons Learned from Word Embeddings

Improving Distributional Similarity with Lessons Learned from Word Embeddings

2015 | Omer Levy, Yoav Goldberg, Ido Dagan
This paper investigates the performance of word embedding models and shows that much of their success is due to system design choices and hyperparameter tuning, rather than the embedding algorithms themselves. The authors demonstrate that these modifications can be transferred to traditional distributional models, yielding similar gains. They find that there is no consistent advantage to any single approach over others, contradicting claims that embeddings are superior to count-based methods. The paper compares four word representation methods: explicit PPMI matrix, SVD factorization of the matrix, SGNS, and GloVe. It shows that while neural-based methods like SGNS and GloVe often perform better on word similarity and analogy tasks, traditional count-based methods can be improved by adapting hyperparameters used in neural models. The authors identify several transferable hyperparameters, including dynamic context windows, subsampling, shifted PMI, context distribution smoothing, and adding context vectors. The study shows that hyperparameter tuning significantly improves performance in every task. In many cases, changing a single hyperparameter yields a greater increase in performance than switching to a better algorithm or training on a larger corpus. The authors also find that when all methods are allowed to tune a similar set of hyperparameters, their performance is largely comparable, and there is no consistent advantage to one algorithmic approach over another. The paper concludes that the success of word embedding models is largely due to hyperparameter tuning and system design choices, rather than the embedding algorithms themselves. It challenges the claim that embeddings are superior to count-based methods and emphasizes the importance of controlled-variable experiments and transparent, reproducible research. The authors also highlight the importance of hyperparameter settings and preprocessing steps in achieving good performance.This paper investigates the performance of word embedding models and shows that much of their success is due to system design choices and hyperparameter tuning, rather than the embedding algorithms themselves. The authors demonstrate that these modifications can be transferred to traditional distributional models, yielding similar gains. They find that there is no consistent advantage to any single approach over others, contradicting claims that embeddings are superior to count-based methods. The paper compares four word representation methods: explicit PPMI matrix, SVD factorization of the matrix, SGNS, and GloVe. It shows that while neural-based methods like SGNS and GloVe often perform better on word similarity and analogy tasks, traditional count-based methods can be improved by adapting hyperparameters used in neural models. The authors identify several transferable hyperparameters, including dynamic context windows, subsampling, shifted PMI, context distribution smoothing, and adding context vectors. The study shows that hyperparameter tuning significantly improves performance in every task. In many cases, changing a single hyperparameter yields a greater increase in performance than switching to a better algorithm or training on a larger corpus. The authors also find that when all methods are allowed to tune a similar set of hyperparameters, their performance is largely comparable, and there is no consistent advantage to one algorithmic approach over another. The paper concludes that the success of word embedding models is largely due to hyperparameter tuning and system design choices, rather than the embedding algorithms themselves. It challenges the claim that embeddings are superior to count-based methods and emphasizes the importance of controlled-variable experiments and transparent, reproducible research. The authors also highlight the importance of hyperparameter settings and preprocessing steps in achieving good performance.
Reach us at info@study.space
Understanding Improving Distributional Similarity with Lessons Learned from Word Embeddings