Understanding Latent Retrieval for Weakly Supervised Open Domain Question Answering

The paper introduces ORQA (Open-Retrieval Question Answering), a novel approach to open-domain question answering (QA) that jointly learns a retriever and reader from question-answer string pairs without relying on a blackbox information retrieval (IR) system. Unlike traditional IR systems, ORQA treats evidence retrieval from Wikipedia as a latent variable and pre-trains the retriever using an Inverse Cloze Task (ICT). This pre-training step provides a strong initialization, enabling end-to-end fine-tuning of the joint retriever-reader model. The authors evaluate ORQA on five open QA datasets, demonstrating that learned retrieval significantly outperforms traditional IR systems like BM25, especially in datasets where the questioner genuinely seeks an answer. The results highlight the importance of effective evidence retrieval in open-domain QA and the potential of end-to-end learning for this task.The paper introduces ORQA (Open-Retrieval Question Answering), a novel approach to open-domain question answering (QA) that jointly learns a retriever and reader from question-answer string pairs without relying on a blackbox information retrieval (IR) system. Unlike traditional IR systems, ORQA treats evidence retrieval from Wikipedia as a latent variable and pre-trains the retriever using an Inverse Cloze Task (ICT). This pre-training step provides a strong initialization, enabling end-to-end fine-tuning of the joint retriever-reader model. The authors evaluate ORQA on five open QA datasets, demonstrating that learned retrieval significantly outperforms traditional IR systems like BM25, especially in datasets where the questioner genuinely seeks an answer. The results highlight the importance of effective evidence retrieval in open-domain QA and the potential of end-to-end learning for this task.

Latent Retrieval for Weakly Supervised Open Domain Question Answering

27 Jun 2019 | Kenton Lee Ming-Wei Chang Kristina Toutanova