[slides and audio] Is Cosine-Similarity of Embeddings Really About Similarity%3F

The paper "Is Cosine-Similarity of Embeddings Really About Similarity?" by Harald Steck, Chaitanya Ekanadham, and Nathan Kallus explores the use of cosine similarity in quantifying semantic similarity between high-dimensional objects represented by learned low-dimensional embeddings. The authors investigate the empirical observation that cosine similarity can sometimes yield arbitrary and meaningless results, even though it is commonly used as a measure of similarity. To gain insights, the authors focus on embeddings derived from regularized linear models, particularly matrix factorization (MF) models. They derive analytical solutions for these models, showing that cosine similarity can yield arbitrary results due to the degree of freedom in the learned embeddings. Specifically, they demonstrate that the cosine similarity of embeddings can depend on an arbitrary diagonal matrix \( D \), which affects the normalization of the embeddings. This leads to different cosine similarities for the same embeddings, even when the underlying dot-products are well-defined and unique. The paper also discusses the implications of these findings beyond linear models, noting that deep models often use combinations of different regularizations, which can introduce implicit and unintended effects on the cosine similarities of the resulting embeddings. The authors caution against blindly using cosine similarity and propose several alternatives, such as training models directly with cosine similarity or avoiding the embedding space altogether. Experiments on simulated data further illustrate the variability of cosine similarities, showing that different choices of regularization can lead to vastly different results. The authors conclude by emphasizing the need to be cautious when using cosine similarity and suggest practical approaches to mitigate its limitations.The paper "Is Cosine-Similarity of Embeddings Really About Similarity?" by Harald Steck, Chaitanya Ekanadham, and Nathan Kallus explores the use of cosine similarity in quantifying semantic similarity between high-dimensional objects represented by learned low-dimensional embeddings. The authors investigate the empirical observation that cosine similarity can sometimes yield arbitrary and meaningless results, even though it is commonly used as a measure of similarity. To gain insights, the authors focus on embeddings derived from regularized linear models, particularly matrix factorization (MF) models. They derive analytical solutions for these models, showing that cosine similarity can yield arbitrary results due to the degree of freedom in the learned embeddings. Specifically, they demonstrate that the cosine similarity of embeddings can depend on an arbitrary diagonal matrix \( D \), which affects the normalization of the embeddings. This leads to different cosine similarities for the same embeddings, even when the underlying dot-products are well-defined and unique. The paper also discusses the implications of these findings beyond linear models, noting that deep models often use combinations of different regularizations, which can introduce implicit and unintended effects on the cosine similarities of the resulting embeddings. The authors caution against blindly using cosine similarity and propose several alternatives, such as training models directly with cosine similarity or avoiding the embedding space altogether. Experiments on simulated data further illustrate the variability of cosine similarities, showing that different choices of regularization can lead to vastly different results. The authors conclude by emphasizing the need to be cautious when using cosine similarity and suggest practical approaches to mitigate its limitations.

Is Cosine-Similarity of Embeddings Really About Similarity?

March 11, 2024 | Harald Steck, Chaitanya Ekanadham, Nathan Kallus