11 Dec 2018 | Robyn Speer, Joshua Chin, Catherine Havasi
ConceptNet 5.5 is an open multilingual knowledge graph that connects words and phrases with labeled edges, representing general knowledge for natural language understanding. It integrates knowledge from various sources, including expert resources, crowd-sourcing, and games with a purpose. ConceptNet is particularly useful for natural language processing (NLP) techniques, such as word embeddings, by providing semantic spaces that are more effective than distributional semantics alone.
ConceptNet 5.5 includes over 21 million edges and 8 million nodes, with a large multilingual vocabulary. It represents relations between words, such as IsA, UsedFor, and CapableOf, and allows for the representation of words as vectors. ConceptNet can be combined with word embeddings from distributional semantics, such as word2vec, to create a hybrid semantic space called ConceptNet Numberbatch, which performs significantly better than other systems in word-relatedness evaluations.
ConceptNet Numberbatch outperforms other systems in tasks such as solving SAT-style analogies, achieving an accuracy of 56.1%, which is slightly lower than the performance of the average human test-taker. It also performs well on the Story Cloze Test, achieving a correct ending rate of 59.4%. The system is built by combining ConceptNet 5.5, word2vec, and GloVe, and is evaluated on various semantic tasks.
ConceptNet provides a rich source of knowledge that can be used to improve word embeddings and enhance NLP applications. It is particularly effective in representing word meanings as vectors and can be used to create more accurate and robust word embeddings. The hybrid system ConceptNet Numberbatch demonstrates that combining knowledge-based and distributional semantics can lead to better performance in semantic tasks. The system is available for use and can be accessed through GitHub and the ConceptNet website.ConceptNet 5.5 is an open multilingual knowledge graph that connects words and phrases with labeled edges, representing general knowledge for natural language understanding. It integrates knowledge from various sources, including expert resources, crowd-sourcing, and games with a purpose. ConceptNet is particularly useful for natural language processing (NLP) techniques, such as word embeddings, by providing semantic spaces that are more effective than distributional semantics alone.
ConceptNet 5.5 includes over 21 million edges and 8 million nodes, with a large multilingual vocabulary. It represents relations between words, such as IsA, UsedFor, and CapableOf, and allows for the representation of words as vectors. ConceptNet can be combined with word embeddings from distributional semantics, such as word2vec, to create a hybrid semantic space called ConceptNet Numberbatch, which performs significantly better than other systems in word-relatedness evaluations.
ConceptNet Numberbatch outperforms other systems in tasks such as solving SAT-style analogies, achieving an accuracy of 56.1%, which is slightly lower than the performance of the average human test-taker. It also performs well on the Story Cloze Test, achieving a correct ending rate of 59.4%. The system is built by combining ConceptNet 5.5, word2vec, and GloVe, and is evaluated on various semantic tasks.
ConceptNet provides a rich source of knowledge that can be used to improve word embeddings and enhance NLP applications. It is particularly effective in representing word meanings as vectors and can be used to create more accurate and robust word embeddings. The hybrid system ConceptNet Numberbatch demonstrates that combining knowledge-based and distributional semantics can lead to better performance in semantic tasks. The system is available for use and can be accessed through GitHub and the ConceptNet website.