23 Jul 2024 | Constanza Fierro, Reinald Kim Amplayo, Fantine Huot, Nicola De Cao, Joshua Maynez, Shashi Narayan, Mirella Lapata
This paper explores the attribution capabilities of plan-based models in generating text with citations. The authors propose two attribution models: an abstractive model that generates questions from scratch and an extractive model that copies questions from the input. Experiments on long-form question-answering show that planning consistently improves attribution quality. Moreover, the citations generated by blueprint models are more accurate compared to those obtained from LLM-based pipelines lacking a planning component.
The study focuses on long-form question answering, where the goal is to generate summaries from a set of passages that answer a specific query. The authors simulate how a search engine might synthesize passages of high relevance to a user query by assuming access to a retriever and some way of verifying the output, i.e., by citing sources. Their models operate on retrieved passages and learn to plan and generate summaries with attribution. Plan-based models allow for different forms of attribution, such as citing passages or the plan itself.
The authors develop automatic methods to annotate training data with plans and citations, and fine-tune several Transformer models to generate attributed text. Experimental results on the AQuAMuSe dataset show that plans consistently improve attribution quality. Furthermore, summary quality improves with an extractive blueprint model. Out-of-domain experiments on the ALCE benchmark show that attribution is a robust skill across information-seeking tasks. The proposed models are competitive with, and sometimes better than, pipelines that heavily rely on large language models.
The study also evaluates the effectiveness of different citation formats and finds that in-line citations strike the best balance between answerability and faithfulness. Human evaluation corroborates the automatic attribution results, showing that blueprint citations are more accurate compared to those produced by LLMs. The abstractive blueprint model competes with LLMs in attribution and performs better than other systems in terms of attribution. The results suggest that plan-based models are effective in generating text with citations and improving the faithfulness and factual consistency of summaries. However, the models still make mistakes, particularly in out-of-domain scenarios.This paper explores the attribution capabilities of plan-based models in generating text with citations. The authors propose two attribution models: an abstractive model that generates questions from scratch and an extractive model that copies questions from the input. Experiments on long-form question-answering show that planning consistently improves attribution quality. Moreover, the citations generated by blueprint models are more accurate compared to those obtained from LLM-based pipelines lacking a planning component.
The study focuses on long-form question answering, where the goal is to generate summaries from a set of passages that answer a specific query. The authors simulate how a search engine might synthesize passages of high relevance to a user query by assuming access to a retriever and some way of verifying the output, i.e., by citing sources. Their models operate on retrieved passages and learn to plan and generate summaries with attribution. Plan-based models allow for different forms of attribution, such as citing passages or the plan itself.
The authors develop automatic methods to annotate training data with plans and citations, and fine-tune several Transformer models to generate attributed text. Experimental results on the AQuAMuSe dataset show that plans consistently improve attribution quality. Furthermore, summary quality improves with an extractive blueprint model. Out-of-domain experiments on the ALCE benchmark show that attribution is a robust skill across information-seeking tasks. The proposed models are competitive with, and sometimes better than, pipelines that heavily rely on large language models.
The study also evaluates the effectiveness of different citation formats and finds that in-line citations strike the best balance between answerability and faithfulness. Human evaluation corroborates the automatic attribution results, showing that blueprint citations are more accurate compared to those produced by LLMs. The abstractive blueprint model competes with LLMs in attribution and performs better than other systems in terms of attribution. The results suggest that plan-based models are effective in generating text with citations and improving the faithfulness and factual consistency of summaries. However, the models still make mistakes, particularly in out-of-domain scenarios.