SQuAD: 100,000+ Questions for Machine Comprehension of Text

SQuAD: 100,000+ Questions for Machine Comprehension of Text

11 Oct 2016 | Pranav Rajpurkar and Jian Zhang and Konstantin Lopyrev and Percy Liang
The Stanford Question Answering Dataset (SQuAD) is a large-scale reading comprehension dataset consisting of over 100,000 questions posed by crowdworkers on Wikipedia articles. Each question has a corresponding answer, which is a segment of text from the passage. The dataset aims to address the need for a large and high-quality resource for training modern data-intensive models in reading comprehension. SQuAD is significantly larger than previous datasets and does not provide answer choices, requiring systems to select the correct span from all possible candidates. The authors analyze the dataset to understand the types of reasoning required to answer the questions, using dependency and constituency trees. They build a logistic regression model that achieves an F1 score of 51.0%, outperforming a simple baseline but still falling short of human performance (86.8%). The dataset is freely available and has sparked significant interest in developing more advanced models.The Stanford Question Answering Dataset (SQuAD) is a large-scale reading comprehension dataset consisting of over 100,000 questions posed by crowdworkers on Wikipedia articles. Each question has a corresponding answer, which is a segment of text from the passage. The dataset aims to address the need for a large and high-quality resource for training modern data-intensive models in reading comprehension. SQuAD is significantly larger than previous datasets and does not provide answer choices, requiring systems to select the correct span from all possible candidates. The authors analyze the dataset to understand the types of reasoning required to answer the questions, using dependency and constituency trees. They build a logistic regression model that achieves an F1 score of 51.0%, outperforming a simple baseline but still falling short of human performance (86.8%). The dataset is freely available and has sparked significant interest in developing more advanced models.
Reach us at info@study.space