This paper introduces a novel generate-then-ground (GenGround) framework for multi-hop question answering (MHQA) tasks, which synergizes the parametric knowledge of large language models (LLMs) with external documents to answer complex questions. The framework alternates between two phases: (1) formulating simpler, single-hop questions and directly generating answers, and (2) grounding the question-answer pairs in retrieved documents to correct any incorrect predictions. To generalize the framework to smaller models, an instructional grounding distillation method is proposed, which distills the knowledge of larger models like ChatGPT into smaller models.
The GenGround framework addresses the limitations of the retrieve-then-read paradigm, which is constrained by the performance of retrievers and the noise in retrieved documents. By generating answers first and then grounding them in external documents, the framework reduces the risk of hallucinations and improves the accuracy of multi-hop QA. The framework is evaluated on four MHQA datasets, demonstrating superior performance compared to existing methods such as ReAct and DSPy.
The framework also includes a batch grounding strategy to efficiently use retrieved documents and reduce the impact of noise. The instructional grounding distillation method is tested on the Natural Questions (NQ) dataset, where it successfully distills the knowledge of ChatGPT into smaller models like Mistral-7B, achieving strong performance.
The paper also presents an ablation study showing that both the answer deduction and grounding phases are essential for effective multi-hop QA. The results indicate that the GenGround framework effectively utilizes the parametric knowledge of LLMs and external documents to answer complex questions. The framework is also shown to be efficient, with lower token consumption compared to strong baselines.
The study highlights the importance of incorporating both LLM parametric knowledge and external documents in multi-hop QA tasks. The framework is adaptable to different retrieval methods and performs well in both high and low recall scenarios. The results demonstrate that the GenGround framework is a promising approach for multi-hop QA, offering improved accuracy and efficiency compared to existing methods.This paper introduces a novel generate-then-ground (GenGround) framework for multi-hop question answering (MHQA) tasks, which synergizes the parametric knowledge of large language models (LLMs) with external documents to answer complex questions. The framework alternates between two phases: (1) formulating simpler, single-hop questions and directly generating answers, and (2) grounding the question-answer pairs in retrieved documents to correct any incorrect predictions. To generalize the framework to smaller models, an instructional grounding distillation method is proposed, which distills the knowledge of larger models like ChatGPT into smaller models.
The GenGround framework addresses the limitations of the retrieve-then-read paradigm, which is constrained by the performance of retrievers and the noise in retrieved documents. By generating answers first and then grounding them in external documents, the framework reduces the risk of hallucinations and improves the accuracy of multi-hop QA. The framework is evaluated on four MHQA datasets, demonstrating superior performance compared to existing methods such as ReAct and DSPy.
The framework also includes a batch grounding strategy to efficiently use retrieved documents and reduce the impact of noise. The instructional grounding distillation method is tested on the Natural Questions (NQ) dataset, where it successfully distills the knowledge of ChatGPT into smaller models like Mistral-7B, achieving strong performance.
The paper also presents an ablation study showing that both the answer deduction and grounding phases are essential for effective multi-hop QA. The results indicate that the GenGround framework effectively utilizes the parametric knowledge of LLMs and external documents to answer complex questions. The framework is also shown to be efficient, with lower token consumption compared to strong baselines.
The study highlights the importance of incorporating both LLM parametric knowledge and external documents in multi-hop QA tasks. The framework is adaptable to different retrieval methods and performs well in both high and low recall scenarios. The results demonstrate that the GenGround framework is a promising approach for multi-hop QA, offering improved accuracy and efficiency compared to existing methods.