[slides] Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change

The paper "Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change" by William L. Hamilton, Jure Leskovec, and Dan Jurafsky explores the use of word embeddings to understand semantic change over time. The authors develop a robust methodology to quantify semantic change by evaluating word embeddings (PPMI, SVD, word2vec) against known historical changes. Using six historical corpora spanning four languages and two centuries, they propose two quantitative laws of semantic change: 1. **The Law of Conformity**: The rate of semantic change scales with an inverse power-law of word frequency. This means that frequently used words change more slowly over time. 2. **The Law of Innovation**: Independent of frequency, words that are more polysemous (have multiple meanings) have higher rates of semantic change. The study uses diachronic embedding methods to align word vectors across different time periods and quantifies semantic change through pairwise word similarity time-series and individual word embedding shifts. The authors compare different embedding approaches (PPMI, SVD, SGNS) on benchmarks to evaluate their synchronic accuracy (within-time-period) and diachronic validity (over time). They find that SGNS performs best on detecting known shifts, while SVD is more accurate on synchronic tasks and has higher average accuracy on detection tasks. The paper also discusses the implications of these findings for future research in historical semantics, suggesting that frequency and polysemy are crucial factors in explaining rates of semantic change.The paper "Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change" by William L. Hamilton, Jure Leskovec, and Dan Jurafsky explores the use of word embeddings to understand semantic change over time. The authors develop a robust methodology to quantify semantic change by evaluating word embeddings (PPMI, SVD, word2vec) against known historical changes. Using six historical corpora spanning four languages and two centuries, they propose two quantitative laws of semantic change: 1. **The Law of Conformity**: The rate of semantic change scales with an inverse power-law of word frequency. This means that frequently used words change more slowly over time. 2. **The Law of Innovation**: Independent of frequency, words that are more polysemous (have multiple meanings) have higher rates of semantic change. The study uses diachronic embedding methods to align word vectors across different time periods and quantifies semantic change through pairwise word similarity time-series and individual word embedding shifts. The authors compare different embedding approaches (PPMI, SVD, SGNS) on benchmarks to evaluate their synchronic accuracy (within-time-period) and diachronic validity (over time). They find that SGNS performs best on detecting known shifts, while SVD is more accurate on synchronic tasks and has higher average accuracy on detection tasks. The paper also discusses the implications of these findings for future research in historical semantics, suggesting that frequency and polysemy are crucial factors in explaining rates of semantic change.

Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change

August 7-12, 2016 | William L. Hamilton, Jure Leskovec, Dan Jurafsky