Visual Relationship Detection with Language Priors

Visual Relationship Detection with Language Priors

31 Jul 2016 | Cewu Lu*, Ranjay Krishna*, Michael Bernstein, Li Fei-Fei
This paper introduces a model for visual relationship detection that leverages language priors to improve performance. Visual relationships capture interactions between pairs of objects in images, such as "man riding bicycle" or "man pushing bicycle." Due to the large number of possible relationships, it is challenging to obtain sufficient training examples for all of them. Previous work has focused on predicting only a few relationships, but this paper proposes a model that can scale to predict thousands of relationships from a few examples. The model uses language priors from semantic word embeddings to fine-tune the likelihood of predicted relationships. It also localizes objects in the predicted relationships as bounding boxes in the image. The model is trained using a bi-convex function and is evaluated on a new dataset containing 5000 images with 37,993 relationships. The model outperforms previous models in visual relationship detection and demonstrates that understanding relationships can improve content-based image retrieval. The model is designed to detect visual relationships by learning visual appearance models for objects and predicates, and using a language module that uses pre-trained word vectors to cast relationships into a vector space where similar relationships are optimized to be close to each other. The model is able to detect unseen relationships through zero-shot learning, leveraging similar relationships it has seen before. The paper also presents results on visual relationship detection, including comparisons with existing methods and evaluations on zero-shot learning. The model is shown to outperform previous state-of-the-art methods in visual relationship detection and to improve content-based image retrieval by understanding relationships between objects. The model is able to detect relationships even when there are very few training examples, and it is able to detect unseen relationships through zero-shot learning. The paper also introduces a new dataset with 37,993 relationships that can be used for further benchmarking.This paper introduces a model for visual relationship detection that leverages language priors to improve performance. Visual relationships capture interactions between pairs of objects in images, such as "man riding bicycle" or "man pushing bicycle." Due to the large number of possible relationships, it is challenging to obtain sufficient training examples for all of them. Previous work has focused on predicting only a few relationships, but this paper proposes a model that can scale to predict thousands of relationships from a few examples. The model uses language priors from semantic word embeddings to fine-tune the likelihood of predicted relationships. It also localizes objects in the predicted relationships as bounding boxes in the image. The model is trained using a bi-convex function and is evaluated on a new dataset containing 5000 images with 37,993 relationships. The model outperforms previous models in visual relationship detection and demonstrates that understanding relationships can improve content-based image retrieval. The model is designed to detect visual relationships by learning visual appearance models for objects and predicates, and using a language module that uses pre-trained word vectors to cast relationships into a vector space where similar relationships are optimized to be close to each other. The model is able to detect unseen relationships through zero-shot learning, leveraging similar relationships it has seen before. The paper also presents results on visual relationship detection, including comparisons with existing methods and evaluations on zero-shot learning. The model is shown to outperform previous state-of-the-art methods in visual relationship detection and to improve content-based image retrieval by understanding relationships between objects. The model is able to detect relationships even when there are very few training examples, and it is able to detect unseen relationships through zero-shot learning. The paper also introduces a new dataset with 37,993 relationships that can be used for further benchmarking.
Reach us at info@study.space