4 Sep 2019 | Kenneth Marino*1, Mohammad Rastegar2, Ali Farhadi2,3 and Roozbeh Mottaghi2
OK-VQA is a new benchmark for visual question answering (VQA) that requires external knowledge. Unlike previous VQA benchmarks, which focus on simple questions that can be answered based on image content, OK-VQA includes questions that require knowledge beyond what is visible in the image. The dataset contains over 14,000 questions across various knowledge categories such as science, history, sports, and more. These questions are designed to challenge models to retrieve information from external sources, such as the web or databases, rather than just relying on image content. The dataset is diverse, difficult, and the largest VQA dataset focused on knowledge-based questions in natural images.
The OK-VQA dataset was created by collecting random images from the COCO dataset and asking human annotators to generate questions that require external knowledge. The questions were then filtered to ensure they required knowledge and were not biased towards common answers. The dataset includes a wide variety of knowledge categories and is designed to test the ability of VQA models to reason and use external knowledge to answer questions.
The paper evaluates several state-of-the-art VQA models on the OK-VQA dataset and shows that their performance significantly drops when compared to standard VQA benchmarks. This indicates that models need to incorporate external knowledge to answer the questions in OK-VQA. The paper also introduces a baseline method called ArticleNet, which retrieves articles from Wikipedia and uses them to answer questions. The results show that combining state-of-the-art models with knowledge retrieval methods can improve performance on the dataset.
The paper also provides statistical analysis of the dataset, showing that it is diverse and challenging. The results demonstrate that the proposed benchmark is quite challenging and that there is a large room for improvement in VQA models that require external knowledge. The paper concludes that OK-VQA is an important benchmark for evaluating the ability of VQA models to use external knowledge to answer questions.OK-VQA is a new benchmark for visual question answering (VQA) that requires external knowledge. Unlike previous VQA benchmarks, which focus on simple questions that can be answered based on image content, OK-VQA includes questions that require knowledge beyond what is visible in the image. The dataset contains over 14,000 questions across various knowledge categories such as science, history, sports, and more. These questions are designed to challenge models to retrieve information from external sources, such as the web or databases, rather than just relying on image content. The dataset is diverse, difficult, and the largest VQA dataset focused on knowledge-based questions in natural images.
The OK-VQA dataset was created by collecting random images from the COCO dataset and asking human annotators to generate questions that require external knowledge. The questions were then filtered to ensure they required knowledge and were not biased towards common answers. The dataset includes a wide variety of knowledge categories and is designed to test the ability of VQA models to reason and use external knowledge to answer questions.
The paper evaluates several state-of-the-art VQA models on the OK-VQA dataset and shows that their performance significantly drops when compared to standard VQA benchmarks. This indicates that models need to incorporate external knowledge to answer the questions in OK-VQA. The paper also introduces a baseline method called ArticleNet, which retrieves articles from Wikipedia and uses them to answer questions. The results show that combining state-of-the-art models with knowledge retrieval methods can improve performance on the dataset.
The paper also provides statistical analysis of the dataset, showing that it is diverse and challenging. The results demonstrate that the proposed benchmark is quite challenging and that there is a large room for improvement in VQA models that require external knowledge. The paper concludes that OK-VQA is an important benchmark for evaluating the ability of VQA models to use external knowledge to answer questions.