Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

14 Mar 2018 | Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord
The AI2 Reasoning Challenge (ARC) is a new question-answering dataset and competition designed to encourage advanced reasoning and knowledge-based question answering. The dataset consists of 7,787 natural science questions, primarily from standardized tests, which are partitioned into a Challenge Set (2,590 questions) and an Easy Set (5,197 questions). The Challenge Set contains questions that cannot be answered by simple retrieval or word co-occurrence algorithms, making it significantly more challenging than previous datasets like SQuAD or SNLI. The authors release the ARC Corpus, a 14M sentence corpus relevant to the task, and three neural baseline models (DecompAttn, BiDAF, and DGEM) for testing. Despite the high performance of these baselines on the Easy Set, they fail to significantly outperform random guessing on the Challenge Set, highlighting the difficulty of the task. The challenge aims to push the boundaries of AI in advanced question answering and is available for researchers to participate in.The AI2 Reasoning Challenge (ARC) is a new question-answering dataset and competition designed to encourage advanced reasoning and knowledge-based question answering. The dataset consists of 7,787 natural science questions, primarily from standardized tests, which are partitioned into a Challenge Set (2,590 questions) and an Easy Set (5,197 questions). The Challenge Set contains questions that cannot be answered by simple retrieval or word co-occurrence algorithms, making it significantly more challenging than previous datasets like SQuAD or SNLI. The authors release the ARC Corpus, a 14M sentence corpus relevant to the task, and three neural baseline models (DecompAttn, BiDAF, and DGEM) for testing. Despite the high performance of these baselines on the Easy Set, they fail to significantly outperform random guessing on the Challenge Set, highlighting the difficulty of the task. The challenge aims to push the boundaries of AI in advanced question answering and is available for researchers to participate in.
Reach us at info@study.space