Source-Aware Training Enables Knowledge Attribution in Language Models

Source-Aware Training Enables Knowledge Attribution in Language Models

2024 | Muhammad Khalifa, David Wadden, Emma Strubell, Honglak Lee, Lu Wang, Iz Beltagy, Hao Peng
This paper introduces source-aware training, a method to enable large language models (LLMs) to cite the pretraining sources of their generated responses. The approach involves two stages: first, training the LLM to associate unique document identifiers with the knowledge in each document, and second, instruction-tuning to teach the LLM to cite supporting pretraining sources when prompted. The method borrows from existing pretraining/fine-tuning frameworks and requires minimal changes to the model architecture or implementation. Experiments on synthetic data show that source-aware training can enable faithful attribution to pretraining data without significantly affecting the model's perplexity compared to standard pretraining. The study highlights the importance of pretraining data augmentation in achieving attribution. The research also explores the impact of various training strategies on attribution, such as training to cite the gold document and data augmentation. The findings suggest that source-aware training can improve the transparency, interpretability, and verifiability of LLMs by enabling them to attribute their responses to the pretraining sources. The paper also discusses the limitations of the approach, including the use of synthetic data and the cost of source-aware training. Overall, the study demonstrates that source-aware training can enable LLMs to provide supporting evidence for their responses, enhancing their trustworthiness and reliability.This paper introduces source-aware training, a method to enable large language models (LLMs) to cite the pretraining sources of their generated responses. The approach involves two stages: first, training the LLM to associate unique document identifiers with the knowledge in each document, and second, instruction-tuning to teach the LLM to cite supporting pretraining sources when prompted. The method borrows from existing pretraining/fine-tuning frameworks and requires minimal changes to the model architecture or implementation. Experiments on synthetic data show that source-aware training can enable faithful attribution to pretraining data without significantly affecting the model's perplexity compared to standard pretraining. The study highlights the importance of pretraining data augmentation in achieving attribution. The research also explores the impact of various training strategies on attribution, such as training to cite the gold document and data augmentation. The findings suggest that source-aware training can improve the transparency, interpretability, and verifiability of LLMs by enabling them to attribute their responses to the pretraining sources. The paper also discusses the limitations of the approach, including the use of synthetic data and the cost of source-aware training. Overall, the study demonstrates that source-aware training can enable LLMs to provide supporting evidence for their responses, enhancing their trustworthiness and reliability.
Reach us at info@study.space
[slides and audio] Source-Aware Training Enables Knowledge Attribution in Language Models