21 Mar 2014 | Mohammad Norouzi*, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean
This paper introduces a method called "Convex Combination of Semantic Embeddings" (ConSE) for constructing an image embedding system from any existing $n$-way image classifier and a semantic word embedding model. ConSE maps images into the semantic embedding space by convexly combining the class label embedding vectors, without requiring additional training. The method leverages the strengths of both the image classifier and the semantic embedding model, achieving superior performance on the ImageNet zero-shot learning task compared to state-of-the-art methods. ConSE is evaluated on various datasets with increasing difficulty, demonstrating its effectiveness in zero-shot learning by leveraging the similarity between class labels and improving retrieval rates. The paper also discusses the differences between ConSE and the Deep Visual-Semantic Embedding (DeViSE) method, highlighting that ConSE's simplicity and direct approach to combining semantic embeddings make it more robust and generalizable to unseen classes.This paper introduces a method called "Convex Combination of Semantic Embeddings" (ConSE) for constructing an image embedding system from any existing $n$-way image classifier and a semantic word embedding model. ConSE maps images into the semantic embedding space by convexly combining the class label embedding vectors, without requiring additional training. The method leverages the strengths of both the image classifier and the semantic embedding model, achieving superior performance on the ImageNet zero-shot learning task compared to state-of-the-art methods. ConSE is evaluated on various datasets with increasing difficulty, demonstrating its effectiveness in zero-shot learning by leveraging the similarity between class labels and improving retrieval rates. The paper also discusses the differences between ConSE and the Deep Visual-Semantic Embedding (DeViSE) method, highlighting that ConSE's simplicity and direct approach to combining semantic embeddings make it more robust and generalizable to unseen classes.