Evaluation of Output Embeddings for Fine-Grained Image Classification

Evaluation of Output Embeddings for Fine-Grained Image Classification

28 Aug 2015 | Zeynep Akata*, Scott Reed†, Daniel Walter†, Honglak Lee† and Bernt Schiele*
This paper presents a structured joint embedding (SJE) framework for zero-shot image classification, which leverages both input and output embeddings to improve classification performance. The framework learns a compatibility function between input and output embeddings, enabling the model to find the label with the highest compatibility score for an image. The study evaluates various supervised and unsupervised output embeddings, including attributes, word embeddings, and hierarchical embeddings derived from WordNet. The results show that unsupervised output embeddings, particularly those learned from Wikipedia and improved with fine-grained text, can achieve competitive performance, even outperforming previously published supervised results. By combining different output embeddings, the framework achieves significant improvements in classification accuracy on the Animals with Attributes (AWA) and Caltech-UCSD Birds (CUB) datasets. The study also demonstrates that using continuous attribute representations improves performance compared to binary attributes. The proposed method is evaluated on three challenging datasets, and the results show that the SJE framework outperforms existing methods in zero-shot learning tasks. The paper concludes that unsupervised output embeddings can be effectively combined with supervised ones to achieve better performance in fine-grained image classification.This paper presents a structured joint embedding (SJE) framework for zero-shot image classification, which leverages both input and output embeddings to improve classification performance. The framework learns a compatibility function between input and output embeddings, enabling the model to find the label with the highest compatibility score for an image. The study evaluates various supervised and unsupervised output embeddings, including attributes, word embeddings, and hierarchical embeddings derived from WordNet. The results show that unsupervised output embeddings, particularly those learned from Wikipedia and improved with fine-grained text, can achieve competitive performance, even outperforming previously published supervised results. By combining different output embeddings, the framework achieves significant improvements in classification accuracy on the Animals with Attributes (AWA) and Caltech-UCSD Birds (CUB) datasets. The study also demonstrates that using continuous attribute representations improves performance compared to binary attributes. The proposed method is evaluated on three challenging datasets, and the results show that the SJE framework outperforms existing methods in zero-shot learning tasks. The paper concludes that unsupervised output embeddings can be effectively combined with supervised ones to achieve better performance in fine-grained image classification.
Reach us at info@study.space
[slides and audio] Evaluation of output embeddings for fine-grained image classification