13 May 2017 | Mandar Joshi, Eunsol Choi, Daniel S. Weld, Luke Zettlemoyer
The paper introduces TriviaQA, a large-scale reading comprehension dataset containing over 650K question-answer triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, with an average of six supporting documents per question. The dataset is designed to test complex, compositional questions, syntactic and lexical variability, and multi-sentence reasoning. The authors present two baseline algorithms: a feature-based classifier and a state-of-the-art neural network, both of which perform poorly compared to human performance (23% and 40% vs. 80%), highlighting the dataset's challenge. The paper also includes a manual analysis of the dataset's quality and challenges, and discusses the dataset's unique features and potential for future research.The paper introduces TriviaQA, a large-scale reading comprehension dataset containing over 650K question-answer triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, with an average of six supporting documents per question. The dataset is designed to test complex, compositional questions, syntactic and lexical variability, and multi-sentence reasoning. The authors present two baseline algorithms: a feature-based classifier and a state-of-the-art neural network, both of which perform poorly compared to human performance (23% and 40% vs. 80%), highlighting the dataset's challenge. The paper also includes a manual analysis of the dataset's quality and challenges, and discusses the dataset's unique features and potential for future research.