20 Dec 2016 | Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick
CLEVR (Compositional Language and Elementary Visual Reasoning) is a diagnostic dataset designed to evaluate the visual reasoning abilities of artificial intelligence systems. The dataset contains 100k rendered images and approximately one million automatically generated questions, with 853k unique questions. CLEVR is designed to minimize biases that models can exploit to answer questions without true reasoning, and it includes detailed annotations that describe the type of reasoning required for each question. The images depict simple 3D shapes, focusing on reasoning skills rather than recognition. The questions are complex and require various forms of reasoning, such as counting, comparing, logical reasoning, and memory. The dataset is structured with functional programs for each question, facilitating in-depth analysis of model performance. Experiments with a suite of VQA models reveal weaknesses in short-term memory, compositional reasoning, and spatial relationship understanding. CLEVR is intended to be used alongside other VQA datasets to study the reasoning capabilities of general VQA systems.CLEVR (Compositional Language and Elementary Visual Reasoning) is a diagnostic dataset designed to evaluate the visual reasoning abilities of artificial intelligence systems. The dataset contains 100k rendered images and approximately one million automatically generated questions, with 853k unique questions. CLEVR is designed to minimize biases that models can exploit to answer questions without true reasoning, and it includes detailed annotations that describe the type of reasoning required for each question. The images depict simple 3D shapes, focusing on reasoning skills rather than recognition. The questions are complex and require various forms of reasoning, such as counting, comparing, logical reasoning, and memory. The dataset is structured with functional programs for each question, facilitating in-depth analysis of model performance. Experiments with a suite of VQA models reveal weaknesses in short-term memory, compositional reasoning, and spatial relationship understanding. CLEVR is intended to be used alongside other VQA datasets to study the reasoning capabilities of general VQA systems.