Understanding Linguistic Regularities in Continuous Space Word Representations

This paper explores the linguistic regularities captured by continuous space word representations learned by recurrent neural network language models (RNNLMs). The authors find that these representations effectively capture both syntactic and semantic regularities, characterized by specific vector offsets between words. For example, the relationship between "male" and "female" is automatically learned, and the vector "King - Man + Woman" is close to "Queen." The paper demonstrates that these word vectors can be used to answer syntactic analogy questions with high accuracy (40%) and outperform previous systems in the SemEval-2012 Task 2, which measures relation similarity. The results highlight the effectiveness of RNNLMs in learning meaningful word representations that generalize well to various linguistic tasks.This paper explores the linguistic regularities captured by continuous space word representations learned by recurrent neural network language models (RNNLMs). The authors find that these representations effectively capture both syntactic and semantic regularities, characterized by specific vector offsets between words. For example, the relationship between "male" and "female" is automatically learned, and the vector "King - Man + Woman" is close to "Queen." The paper demonstrates that these word vectors can be used to answer syntactic analogy questions with high accuracy (40%) and outperform previous systems in the SemEval-2012 Task 2, which measures relation similarity. The results highlight the effectiveness of RNNLMs in learning meaningful word representations that generalize well to various linguistic tasks.

Linguistic Regularities in Continuous Space Word Representations

9–14 June 2013 | Tomas Mikolov*, Wen-tau Yih, Geoffrey Zweig