24 May 2019 | Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, Kristina Toutanova
This paper introduces BoolQ, a new reading comprehension dataset consisting of naturally occurring yes/no questions. The questions are generated in unprompted and unconstrained settings and are challenging because they often require complex, non-factoid information and entailment-like inference to solve. The dataset was created by collecting 16,000 naturally occurring yes/no questions from Wikipedia, paired with relevant passages. Each question is labeled as "yes" or "no" based on the passage.
The study explores the effectiveness of various transfer learning baselines for yes/no QA. It finds that transferring from entailment data is more effective than transferring from paraphrase or extractive QA data, and that it continues to be beneficial even when starting from large pre-trained language models like BERT. The best method involves training BERT on MultiNLI and then retraining it on the BoolQ dataset, achieving 80.4% accuracy, compared to 90% accuracy of human annotators and 62% for the majority baseline.
The dataset includes a variety of question types, such as those requiring existence, event occurrence, or definitional knowledge. The questions often require subtle inference based on how the passage is written. The study also finds that natural yes/no questions are challenging because they often require inference beyond simple fact-based answers. The dataset is designed to test inferential abilities and is directly related to the practical task of answering user yes/no questions.
The paper also evaluates various models for yes/no QA, including shallow models, neural models, and transfer learning approaches. It finds that transfer learning from entailment data, such as MultiNLI, significantly improves performance. The best results are achieved by combining pre-training on MultiNLI with fine-tuning on the BoolQ dataset, achieving 80.43% accuracy. The study highlights the difficulty of the dataset and the importance of using transfer learning to improve performance on natural yes/no questions.This paper introduces BoolQ, a new reading comprehension dataset consisting of naturally occurring yes/no questions. The questions are generated in unprompted and unconstrained settings and are challenging because they often require complex, non-factoid information and entailment-like inference to solve. The dataset was created by collecting 16,000 naturally occurring yes/no questions from Wikipedia, paired with relevant passages. Each question is labeled as "yes" or "no" based on the passage.
The study explores the effectiveness of various transfer learning baselines for yes/no QA. It finds that transferring from entailment data is more effective than transferring from paraphrase or extractive QA data, and that it continues to be beneficial even when starting from large pre-trained language models like BERT. The best method involves training BERT on MultiNLI and then retraining it on the BoolQ dataset, achieving 80.4% accuracy, compared to 90% accuracy of human annotators and 62% for the majority baseline.
The dataset includes a variety of question types, such as those requiring existence, event occurrence, or definitional knowledge. The questions often require subtle inference based on how the passage is written. The study also finds that natural yes/no questions are challenging because they often require inference beyond simple fact-based answers. The dataset is designed to test inferential abilities and is directly related to the practical task of answering user yes/no questions.
The paper also evaluates various models for yes/no QA, including shallow models, neural models, and transfer learning approaches. It finds that transfer learning from entailment data, such as MultiNLI, significantly improves performance. The best results are achieved by combining pre-training on MultiNLI with fine-tuning on the BoolQ dataset, achieving 80.43% accuracy. The study highlights the difficulty of the dataset and the importance of using transfer learning to improve performance on natural yes/no questions.