August 7-12, 2016 | William L. Hamilton, Jure Leskovec, Dan Jurafsky
This paper presents a study on semantic change using diachronic word embeddings. The authors evaluate three embedding methods—PPMI, SVD, and word2vec—on six historical corpora spanning four languages and two centuries. They propose two statistical laws of semantic change: (i) the law of conformity, where the rate of semantic change scales with an inverse power-law of word frequency; and (ii) the law of innovation, where polysemous words change more rapidly than less polysemous words, even after controlling for frequency.
The study uses distributional semantics to model word embeddings, aligning them across time periods to compare semantic changes. They find that frequent words change more slowly, while polysemous words change more rapidly. These findings are supported by a linear mixed model analysis showing that the logarithm of word frequency has a significant negative effect on semantic change rates, and the logarithm of polysemy has a significant positive effect.
The authors also compare different embedding approaches and find that SVD performs best on synchronic accuracy tasks, while SGNS performs best on discovery tasks. They conclude that both methods are useful for studying semantic change, but each has its own trade-offs. SGNS is more robust to corpus artifacts, while SVD is more sensitive to changes in usage.
The study reveals that frequency and polysemy are key factors in semantic change, explaining between 48% and 88% of the variance in semantic change rates. These findings have important implications for understanding the mechanisms of semantic change and the role of frequency and polysemy in language evolution. The results suggest that polysemy may actually lead to semantic change, challenging previous assumptions. The study also highlights the importance of distributional models in historical research and the need for further investigation into the causal mechanisms underlying semantic change.This paper presents a study on semantic change using diachronic word embeddings. The authors evaluate three embedding methods—PPMI, SVD, and word2vec—on six historical corpora spanning four languages and two centuries. They propose two statistical laws of semantic change: (i) the law of conformity, where the rate of semantic change scales with an inverse power-law of word frequency; and (ii) the law of innovation, where polysemous words change more rapidly than less polysemous words, even after controlling for frequency.
The study uses distributional semantics to model word embeddings, aligning them across time periods to compare semantic changes. They find that frequent words change more slowly, while polysemous words change more rapidly. These findings are supported by a linear mixed model analysis showing that the logarithm of word frequency has a significant negative effect on semantic change rates, and the logarithm of polysemy has a significant positive effect.
The authors also compare different embedding approaches and find that SVD performs best on synchronic accuracy tasks, while SGNS performs best on discovery tasks. They conclude that both methods are useful for studying semantic change, but each has its own trade-offs. SGNS is more robust to corpus artifacts, while SVD is more sensitive to changes in usage.
The study reveals that frequency and polysemy are key factors in semantic change, explaining between 48% and 88% of the variance in semantic change rates. These findings have important implications for understanding the mechanisms of semantic change and the role of frequency and polysemy in language evolution. The results suggest that polysemy may actually lead to semantic change, challenging previous assumptions. The study also highlights the importance of distributional models in historical research and the need for further investigation into the causal mechanisms underlying semantic change.