| Ranjay Krishna · Yuke Zhu · Oliver Groth · Justin Johnson · Kenji Hata · Joshua Kravitz · Stephanie Chen · Yannis Kalantidis · Li-Jia Li · David A. Shamma · Michael S. Bernstein · Li Fei-Fei
The Visual Genome dataset is designed to enable the modeling of relationships between objects in images, bridging the gap between perceptual and cognitive understanding of visual scenes. It contains over 100,000 images, each with an average of 21 objects, 18 attributes, and 18 pairwise relationships. The dataset includes dense annotations such as region descriptions, objects, attributes, relationships, and question-answer pairs. These annotations are canonicalized to WordNet synsets, providing a structured representation of images that can be used in knowledge base representations. The dataset includes 1.7 million question-answer pairs, making it one of the largest and most diverse image datasets available. It is designed to support tasks such as image description, question answering, and scene graph generation. The dataset is collected through crowdsourcing, with a focus on accuracy and diversity. Verification processes ensure the quality of annotations, and canonicalization maps objects, attributes, and relationships to WordNet synsets. The Visual Genome dataset aims to provide a comprehensive resource for research in computer vision and cognitive tasks.The Visual Genome dataset is designed to enable the modeling of relationships between objects in images, bridging the gap between perceptual and cognitive understanding of visual scenes. It contains over 100,000 images, each with an average of 21 objects, 18 attributes, and 18 pairwise relationships. The dataset includes dense annotations such as region descriptions, objects, attributes, relationships, and question-answer pairs. These annotations are canonicalized to WordNet synsets, providing a structured representation of images that can be used in knowledge base representations. The dataset includes 1.7 million question-answer pairs, making it one of the largest and most diverse image datasets available. It is designed to support tasks such as image description, question answering, and scene graph generation. The dataset is collected through crowdsourcing, with a focus on accuracy and diversity. Verification processes ensure the quality of annotations, and canonicalization maps objects, attributes, and relationships to WordNet synsets. The Visual Genome dataset aims to provide a comprehensive resource for research in computer vision and cognitive tasks.