Understanding Source-Aware Training Enables Knowledge Attribution in Language Models

The paper "Source-Aware Training Enables Knowledge Attribution in Language Models" by Muhammad Khalifa et al. explores the problem of intrinsic source citation, where large language models (LLMs) are required to cite the pretraining sources supporting their generated responses. The authors propose a source-aware training approach that involves two stages: (i) training the LLM to associate unique document identifiers with the knowledge in each document, and (ii) an instruction-tuning stage to teach the LLM to cite a supporting pretraining source when prompted. This method enhances LLM transparency, interpretability, and verifiability. Experiments on synthetic data demonstrate that the proposed training method can enable faithful attribution to pretraining data without significantly impacting the model's perplexity. The study also highlights the importance of pretraining data augmentation in achieving accurate attribution. The authors conclude that their findings are valuable for future research on training verifiable and trustworthy models.The paper "Source-Aware Training Enables Knowledge Attribution in Language Models" by Muhammad Khalifa et al. explores the problem of intrinsic source citation, where large language models (LLMs) are required to cite the pretraining sources supporting their generated responses. The authors propose a source-aware training approach that involves two stages: (i) training the LLM to associate unique document identifiers with the knowledge in each document, and (ii) an instruction-tuning stage to teach the LLM to cite a supporting pretraining source when prompted. This method enhances LLM transparency, interpretability, and verifiability. Experiments on synthetic data demonstrate that the proposed training method can enable faithful attribution to pretraining data without significantly impacting the model's perplexity. The study also highlights the importance of pretraining data augmentation in achieving accurate attribution. The authors conclude that their findings are valuable for future research on training verifiable and trustworthy models.

Source-Aware Training Enables Knowledge Attribution in Language Models

13 Aug 2024 | Muhammad Khalifa†*, David Wadden‡, Emma Strubell‡§, Honglak Lee†, Lu Wang†, Iz Beltagy†, Hao Peng‡*

13 Aug 2024 | Muhammad Khalifa†, David Wadden‡, Emma Strubell‡§, Honglak Lee†, Lu Wang†, Iz Beltagy†, Hao Peng‡