7 Feb 2017 | Adam Trischler*, Tong Wang*, Xingdi Yuan*, Justin Harris Alessandro Sordoni Philip Bachman Kaheer Suleman
The paper introduces *NewsQA*, a large-scale machine comprehension dataset containing over 100,000 human-generated question-answer pairs. The dataset is constructed through a four-stage process involving crowdworkers who read CNN articles, pose questions, and determine answers. *NewsQA* is designed to challenge models with complex reasoning tasks, such as synthesizing information from multiple sentences and inferring answers from incomplete information. The authors analyze the dataset's characteristics, including the types of answers and the forms of reasoning required. They compare human performance on *NewsQA* with that of two neural models, finding a significant gap (0.198 F1) that suggests room for improvement in machine comprehension research. The dataset is freely available for research purposes.The paper introduces *NewsQA*, a large-scale machine comprehension dataset containing over 100,000 human-generated question-answer pairs. The dataset is constructed through a four-stage process involving crowdworkers who read CNN articles, pose questions, and determine answers. *NewsQA* is designed to challenge models with complex reasoning tasks, such as synthesizing information from multiple sentences and inferring answers from incomplete information. The authors analyze the dataset's characteristics, including the types of answers and the forms of reasoning required. They compare human performance on *NewsQA* with that of two neural models, finding a significant gap (0.198 F1) that suggests room for improvement in machine comprehension research. The dataset is freely available for research purposes.