5 Dec 2017 | Guokun Lai and Qizhe Xie and Hanxiao Liu and Yiming Yang and Eduard Hovy
RACE is a new large-scale reading comprehension dataset collected from English exams for Chinese middle and high school students aged 12-18. It contains nearly 28,000 passages and nearly 100,000 questions generated by human experts, covering a wide range of topics. The dataset is designed to evaluate students' ability in understanding and reasoning, with a significant proportion of questions requiring reasoning, which is more than in other benchmark datasets. The performance of state-of-the-art models on RACE is 43%, while the ceiling human performance is 95%. RACE is freely available for research and evaluation in machine comprehension.
RACE is constructed to address limitations of existing datasets, such as the use of context-based answers, noise from crowd-sourced or automatically-generated questions, and limited topic coverage. Unlike existing datasets, RACE allows answers to be any words, not just text spans from the passage. It also covers a broad range of topics and writing styles, making it a comprehensive resource for evaluating machine reading comprehension.
The dataset includes two important types of reasoning: passage summarization and attitude analysis, which are not present in other large-scale datasets. RACE-M and RACE-H are subsets for middle and high school students, respectively. The dataset is evaluated using human annotations, showing that 59.2% of questions require reasoning, compared to 21%, 20.5%, and 33.9% for CNN, SQUAD, and NEWSQA.
Several state-of-the-art models, including the Sliding Window Algorithm, Stanford Attentive Reader, and Gated-Attention Reader, are evaluated on RACE. The performance of these models is compared to human performance, with the best model achieving 43.3% accuracy on RACE, while the ceiling human performance is 95.4% for RACE-M and 94.2% for RACE-H. The results show that RACE is the most challenging dataset among existing large-scale machine comprehension datasets.RACE is a new large-scale reading comprehension dataset collected from English exams for Chinese middle and high school students aged 12-18. It contains nearly 28,000 passages and nearly 100,000 questions generated by human experts, covering a wide range of topics. The dataset is designed to evaluate students' ability in understanding and reasoning, with a significant proportion of questions requiring reasoning, which is more than in other benchmark datasets. The performance of state-of-the-art models on RACE is 43%, while the ceiling human performance is 95%. RACE is freely available for research and evaluation in machine comprehension.
RACE is constructed to address limitations of existing datasets, such as the use of context-based answers, noise from crowd-sourced or automatically-generated questions, and limited topic coverage. Unlike existing datasets, RACE allows answers to be any words, not just text spans from the passage. It also covers a broad range of topics and writing styles, making it a comprehensive resource for evaluating machine reading comprehension.
The dataset includes two important types of reasoning: passage summarization and attitude analysis, which are not present in other large-scale datasets. RACE-M and RACE-H are subsets for middle and high school students, respectively. The dataset is evaluated using human annotations, showing that 59.2% of questions require reasoning, compared to 21%, 20.5%, and 33.9% for CNN, SQUAD, and NEWSQA.
Several state-of-the-art models, including the Sliding Window Algorithm, Stanford Attentive Reader, and Gated-Attention Reader, are evaluated on RACE. The performance of these models is compared to human performance, with the best model achieving 43.3% accuracy on RACE, while the ceiling human performance is 95.4% for RACE-M and 94.2% for RACE-H. The results show that RACE is the most challenging dataset among existing large-scale machine comprehension datasets.