Word Embeddings Revisited: Do LLMs Offer Something New?

Word Embeddings Revisited: Do LLMs Offer Something New?

2 Mar 2024 | Matthew Freestone, Shubhra Kanti Karmaker Santu
This paper investigates whether Large Language Models (LLMs) offer new insights in word embeddings compared to classical models like SentenceBERT (SBERT) and Universal Sentence Encoder (USE). The study compares LLM-based embeddings with classical ones in terms of latent vector semantics and performance on word analogy tasks. The results show that LLMs tend to cluster semantically related words more tightly than classical models. LLMs also achieve higher accuracy on the Bigger Analogy Test Set (BATS) compared to classical methods. Some LLMs produce embeddings similar to SBERT, a lighter classical model. The study evaluates six embedding models: three LLM-based (LLaMA2-7B, ADA-002, PaLM2) and three classical (SBERT, USE, LASER). The analysis includes two main tasks: word-pair similarity and word analogy tasks. For word-pair similarity, the cosine similarity distributions of semantically related, morphologically related, and unrelated word pairs were compared. LLMs like ADA and PaLM showed better performance in distinguishing semantically related pairs from unrelated ones. SBERT also performed well in this aspect. For word analogy tasks, the study used the BATS dataset and evaluated the performance of the six models using vector arithmetic in the latent space. ADA and PaLM performed well, while LLaMA had the worst performance among LLMs. SBERT, a classical model, often ranked third, suggesting it can be an efficient alternative in resource-constrained settings. The study also examined how LLMs and classical models agree or disagree on word pair similarities. Results showed that ADA and PaLM, two LLMs, agreed with each other and with SBERT, suggesting that SBERT can be an efficient alternative when resources are limited. Overall, the findings indicate that while LLMs can capture meaningful semantics and yield high accuracy, classical models like SBERT can be effective in resource-constrained environments. The study highlights the importance of considering both the scale and the underlying semantics of embeddings when evaluating their performance.This paper investigates whether Large Language Models (LLMs) offer new insights in word embeddings compared to classical models like SentenceBERT (SBERT) and Universal Sentence Encoder (USE). The study compares LLM-based embeddings with classical ones in terms of latent vector semantics and performance on word analogy tasks. The results show that LLMs tend to cluster semantically related words more tightly than classical models. LLMs also achieve higher accuracy on the Bigger Analogy Test Set (BATS) compared to classical methods. Some LLMs produce embeddings similar to SBERT, a lighter classical model. The study evaluates six embedding models: three LLM-based (LLaMA2-7B, ADA-002, PaLM2) and three classical (SBERT, USE, LASER). The analysis includes two main tasks: word-pair similarity and word analogy tasks. For word-pair similarity, the cosine similarity distributions of semantically related, morphologically related, and unrelated word pairs were compared. LLMs like ADA and PaLM showed better performance in distinguishing semantically related pairs from unrelated ones. SBERT also performed well in this aspect. For word analogy tasks, the study used the BATS dataset and evaluated the performance of the six models using vector arithmetic in the latent space. ADA and PaLM performed well, while LLaMA had the worst performance among LLMs. SBERT, a classical model, often ranked third, suggesting it can be an efficient alternative in resource-constrained settings. The study also examined how LLMs and classical models agree or disagree on word pair similarities. Results showed that ADA and PaLM, two LLMs, agreed with each other and with SBERT, suggesting that SBERT can be an efficient alternative when resources are limited. Overall, the findings indicate that while LLMs can capture meaningful semantics and yield high accuracy, classical models like SBERT can be effective in resource-constrained environments. The study highlights the importance of considering both the scale and the underlying semantics of embeddings when evaluating their performance.
Reach us at info@study.space
Understanding Revisiting Word Embeddings in the LLM Era