4 Sep 2019 | Kenneth Marino*1, Mohammad Rastegar2, Ali Farhadi2,3 and Roozbeh Mottaghi2
The paper introduces a new benchmark for visual question answering (VQA) called OK-VQA, which focuses on knowledge-based VQA tasks. Unlike traditional VQA datasets that primarily involve simple counting, visual attributes, and object detection, OK-VQA includes over 14,000 questions that require external knowledge to answer. The authors highlight the limitations of current VQA models in handling such complex questions and demonstrate significant performance degradation on the new dataset. They also provide a detailed analysis of the dataset's properties and statistics, showing that it is diverse, challenging, and the largest VQA dataset specifically designed for knowledge-based VQA. The paper includes experimental results using state-of-the-art VQA models and baseline approaches, emphasizing the importance of incorporating external knowledge for effective VQA performance. The authors aim to inspire further research in this area and provide a valuable resource for the community.The paper introduces a new benchmark for visual question answering (VQA) called OK-VQA, which focuses on knowledge-based VQA tasks. Unlike traditional VQA datasets that primarily involve simple counting, visual attributes, and object detection, OK-VQA includes over 14,000 questions that require external knowledge to answer. The authors highlight the limitations of current VQA models in handling such complex questions and demonstrate significant performance degradation on the new dataset. They also provide a detailed analysis of the dataset's properties and statistics, showing that it is diverse, challenging, and the largest VQA dataset specifically designed for knowledge-based VQA. The paper includes experimental results using state-of-the-art VQA models and baseline approaches, emphasizing the importance of incorporating external knowledge for effective VQA performance. The authors aim to inspire further research in this area and provide a valuable resource for the community.