Every Picture Tells a Story: Generating Sentences from Images

Every Picture Tells a Story: Generating Sentences from Images

2010 | Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth
This paper presents a system for generating descriptive sentences from images and vice versa. The system computes a score that links an image to a sentence by comparing the meaning estimated from the image and the sentence. This score can be used to attach a descriptive sentence to an image or to find images that illustrate a given sentence. The meaning is estimated using a discriminative procedure learned from data. The system introduces a novel representation intermediate between images and sentences, which is a triplet of object, action, and scene. The system also introduces a novel, discriminative approach for sentence annotation. The paper evaluates the system on a novel dataset of human-annotated images. The system's underlying meaning estimation is limited, but it produces good quantitative results using a novel score that accounts for synecdoche. The system uses distributional semantics to handle out-of-vocabulary words. The paper also discusses the challenges of evaluating sentence generation due to the fluid nature of sentences and the use of synecdoche. The system is evaluated using a quantitative measure called Tree-F1 and BLUE. The results show that the system can generate accurate sentences from images and find images that best match given sentences. The system is able to handle out-of-vocabulary words by using distributional semantics. The paper concludes that the intermediate meaning representation is a key component of the system, allowing the use of distributional semantics for recognizing unknown objects and actions. The system is evaluated on a dataset of 1000 images, with 600 used for training and 400 for testing. The system generates sentences for images and finds images that best match given sentences. The results show that the system can generate accurate sentences and find images that match given sentences. The system is able to handle out-of-vocabulary words by using distributional semantics. The paper discusses the challenges of evaluating sentence generation and the use of synecdoche. The system is evaluated using a quantitative measure called Tree-F1 and BLUE. The results show that the system can generate accurate sentences from images and find images that best match given sentences. The system is able to handle out-of-vocabulary words by using distributional semantics.This paper presents a system for generating descriptive sentences from images and vice versa. The system computes a score that links an image to a sentence by comparing the meaning estimated from the image and the sentence. This score can be used to attach a descriptive sentence to an image or to find images that illustrate a given sentence. The meaning is estimated using a discriminative procedure learned from data. The system introduces a novel representation intermediate between images and sentences, which is a triplet of object, action, and scene. The system also introduces a novel, discriminative approach for sentence annotation. The paper evaluates the system on a novel dataset of human-annotated images. The system's underlying meaning estimation is limited, but it produces good quantitative results using a novel score that accounts for synecdoche. The system uses distributional semantics to handle out-of-vocabulary words. The paper also discusses the challenges of evaluating sentence generation due to the fluid nature of sentences and the use of synecdoche. The system is evaluated using a quantitative measure called Tree-F1 and BLUE. The results show that the system can generate accurate sentences from images and find images that best match given sentences. The system is able to handle out-of-vocabulary words by using distributional semantics. The paper concludes that the intermediate meaning representation is a key component of the system, allowing the use of distributional semantics for recognizing unknown objects and actions. The system is evaluated on a dataset of 1000 images, with 600 used for training and 400 for testing. The system generates sentences for images and finds images that best match given sentences. The results show that the system can generate accurate sentences and find images that match given sentences. The system is able to handle out-of-vocabulary words by using distributional semantics. The paper discusses the challenges of evaluating sentence generation and the use of synecdoche. The system is evaluated using a quantitative measure called Tree-F1 and BLUE. The results show that the system can generate accurate sentences from images and find images that best match given sentences. The system is able to handle out-of-vocabulary words by using distributional semantics.
Reach us at info@futurestudyspace.com
[slides] Every Picture Tells a Story%3A Generating Sentences from Images | StudySpace