7 Feb 2017 | Adam Trischler*, Tong Wang*, Xingdi Yuan*, Justin Harris Alessandro Sordoni Philip Bachman Kaheer Suleman
NewsQA is a large-scale machine comprehension dataset containing over 100,000 human-generated question-answer pairs based on 12,744 CNN news articles. The dataset was collected through a four-stage process to encourage exploratory and curiosity-based questions. Answers are spans of text from the corresponding articles, and some questions have no answer (null span). The dataset is designed to challenge models beyond simple word matching and textual entailment, requiring reasoning, synthesis, and inference. NewsQA is compared to other datasets like SQuAD, which also contains question-answer pairs from news articles. However, NewsQA presents a greater challenge due to its complexity, including longer answer spans, no candidate answers, and a need for more advanced reasoning. The dataset is freely available for research. Human performance on NewsQA is measured and compared to neural models, showing a significant gap in performance (0.198 F1). The dataset is designed to push the development of more intelligent machine comprehension systems. NewsQA is a challenging benchmark for evaluating machine comprehension models, with a wide range of answer types and reasoning tasks. The dataset includes a variety of answer types, such as common noun phrases, clause phrases, and others. The reasoning types required to answer questions include word matching, paraphrasing, inference, synthesis, and ambiguous/insufficient answers. The dataset is used to evaluate the performance of machine comprehension models, including the BARB model, which achieves lower performance than humans on certain tasks. The dataset is also used to evaluate sentence-level accuracy, showing that NewsQA is more challenging than SQuAD. NewsQA is a significant extension to existing comprehension datasets, offering a new benchmark for machine comprehension research.NewsQA is a large-scale machine comprehension dataset containing over 100,000 human-generated question-answer pairs based on 12,744 CNN news articles. The dataset was collected through a four-stage process to encourage exploratory and curiosity-based questions. Answers are spans of text from the corresponding articles, and some questions have no answer (null span). The dataset is designed to challenge models beyond simple word matching and textual entailment, requiring reasoning, synthesis, and inference. NewsQA is compared to other datasets like SQuAD, which also contains question-answer pairs from news articles. However, NewsQA presents a greater challenge due to its complexity, including longer answer spans, no candidate answers, and a need for more advanced reasoning. The dataset is freely available for research. Human performance on NewsQA is measured and compared to neural models, showing a significant gap in performance (0.198 F1). The dataset is designed to push the development of more intelligent machine comprehension systems. NewsQA is a challenging benchmark for evaluating machine comprehension models, with a wide range of answer types and reasoning tasks. The dataset includes a variety of answer types, such as common noun phrases, clause phrases, and others. The reasoning types required to answer questions include word matching, paraphrasing, inference, synthesis, and ambiguous/insufficient answers. The dataset is used to evaluate the performance of machine comprehension models, including the BARB model, which achieves lower performance than humans on certain tasks. The dataset is also used to evaluate sentence-level accuracy, showing that NewsQA is more challenging than SQuAD. NewsQA is a significant extension to existing comprehension datasets, offering a new benchmark for machine comprehension research.