The paper introduces a novel framework called Generate-then-Ground (GenGround) for multi-hop question answering (MHQA) tasks, which integrates the parametric knowledge of large language models (LLMs) with external documents. GenGround alternates between two phases: (1) formulating a simpler, single-hop question and generating an answer directly, and (2) grounding the question-answer pair in retrieved documents to revise any incorrect predictions. This approach aims to mitigate the limitations of the traditional retrieve-then-read paradigm, which is constrained by the performance of the retriever and the noise in retrieved documents. The paper also proposes an instructional grounding distillation method to adapt the framework to smaller models. Extensive experiments on four datasets demonstrate the superior performance of GenGround over strong baselines, achieving the best overall results. The method effectively synergizes LLMs' deductive abilities with external knowledge, improving the accuracy and reliability of answers in multi-hop question answering tasks.The paper introduces a novel framework called Generate-then-Ground (GenGround) for multi-hop question answering (MHQA) tasks, which integrates the parametric knowledge of large language models (LLMs) with external documents. GenGround alternates between two phases: (1) formulating a simpler, single-hop question and generating an answer directly, and (2) grounding the question-answer pair in retrieved documents to revise any incorrect predictions. This approach aims to mitigate the limitations of the traditional retrieve-then-read paradigm, which is constrained by the performance of the retriever and the noise in retrieved documents. The paper also proposes an instructional grounding distillation method to adapt the framework to smaller models. Extensive experiments on four datasets demonstrate the superior performance of GenGround over strong baselines, achieving the best overall results. The method effectively synergizes LLMs' deductive abilities with external knowledge, improving the accuracy and reliability of answers in multi-hop question answering tasks.