This paper presents a generalization of the Skip-gram word embedding model, which traditionally uses linear contexts (words surrounding a target word in a fixed window). The authors extend this model to include arbitrary contexts, such as dependency-based syntactic contexts derived from dependency parse trees. They show that these different types of contexts produce markedly different word embeddings. Dependency-based embeddings are less topical and exhibit more functional similarity than the original Skip-gram embeddings.
The Skip-gram model is based on the distributional hypothesis, which states that words appearing in similar contexts have similar meanings. The model learns word embeddings by maximizing the dot product between word and context vectors for observed word-context pairs, while minimizing it for unobserved pairs. Negative sampling is used to efficiently train the model.
The authors experiment with three types of contexts: BOW5 (bag-of-words with k=5), BOW2 (bag-of-words with k=2), and DEPS (dependency-based contexts). They find that DEPS embeddings yield more functional similarity, capturing semantic types rather than topical associations. For example, the word "Hogwarts" is associated with famous schools in DEPS embeddings, while BOW embeddings reflect domain-related words.
The authors also demonstrate that the Skip-gram model allows for some introspection by querying the model for contexts that are "activated" by a target word. This enables the exploration of the contexts that the model learns to be most discriminative for a given word.
The paper concludes that dependency-based contexts produce more focused embeddings, capturing functional similarity rather than topical similarity. These results align with findings in distributional semantics literature. The authors hope that insights from model introspection will lead to better context modeling and improved word embeddings. Their software, which allows for experimentation with arbitrary contexts, is available for download.This paper presents a generalization of the Skip-gram word embedding model, which traditionally uses linear contexts (words surrounding a target word in a fixed window). The authors extend this model to include arbitrary contexts, such as dependency-based syntactic contexts derived from dependency parse trees. They show that these different types of contexts produce markedly different word embeddings. Dependency-based embeddings are less topical and exhibit more functional similarity than the original Skip-gram embeddings.
The Skip-gram model is based on the distributional hypothesis, which states that words appearing in similar contexts have similar meanings. The model learns word embeddings by maximizing the dot product between word and context vectors for observed word-context pairs, while minimizing it for unobserved pairs. Negative sampling is used to efficiently train the model.
The authors experiment with three types of contexts: BOW5 (bag-of-words with k=5), BOW2 (bag-of-words with k=2), and DEPS (dependency-based contexts). They find that DEPS embeddings yield more functional similarity, capturing semantic types rather than topical associations. For example, the word "Hogwarts" is associated with famous schools in DEPS embeddings, while BOW embeddings reflect domain-related words.
The authors also demonstrate that the Skip-gram model allows for some introspection by querying the model for contexts that are "activated" by a target word. This enables the exploration of the contexts that the model learns to be most discriminative for a given word.
The paper concludes that dependency-based contexts produce more focused embeddings, capturing functional similarity rather than topical similarity. These results align with findings in distributional semantics literature. The authors hope that insights from model introspection will lead to better context modeling and improved word embeddings. Their software, which allows for experimentation with arbitrary contexts, is available for download.