WIKIQA: A Challenge Dataset for Open-Domain Question Answering

WIKIQA: A Challenge Dataset for Open-Domain Question Answering

2015 | Yi Yang, Wen-tau Yih, Christopher Meek
The paper introduces WIKIQA, a new dataset for open-domain question answering, which is constructed using Bing query logs and Wikipedia pages. Unlike the QASENT dataset, which is based on TREC-QA data and has a biased selection process, WIKIQA aims to reflect more natural and realistic question-answering scenarios. The dataset includes 3,047 questions and 29,258 candidate answer sentences, with about one-third of the questions containing correct answers. The paper compares several systems on both datasets, finding that lexical semantic methods perform well on QASENT, while sentence semantic models (e.g., convolutional neural networks) outperform lexical methods on WIKIQA. Additionally, the paper introduces the task of answer triggering, which involves detecting whether there are any correct answers in the candidate sentences, and evaluates the performance of a system on this task using question-level precision, recall, and F1 scores. The results suggest that deeper semantic understanding and answer inference are crucial for effective QA systems.The paper introduces WIKIQA, a new dataset for open-domain question answering, which is constructed using Bing query logs and Wikipedia pages. Unlike the QASENT dataset, which is based on TREC-QA data and has a biased selection process, WIKIQA aims to reflect more natural and realistic question-answering scenarios. The dataset includes 3,047 questions and 29,258 candidate answer sentences, with about one-third of the questions containing correct answers. The paper compares several systems on both datasets, finding that lexical semantic methods perform well on QASENT, while sentence semantic models (e.g., convolutional neural networks) outperform lexical methods on WIKIQA. Additionally, the paper introduces the task of answer triggering, which involves detecting whether there are any correct answers in the candidate sentences, and evaluates the performance of a system on this task using question-level precision, recall, and F1 scores. The results suggest that deeper semantic understanding and answer inference are crucial for effective QA systems.
Reach us at info@study.space
[slides and audio] WikiQA%3A A Challenge Dataset for Open-Domain Question Answering