[slides] Improving Attributed Text Generation of Large Language Models via Preference Learning

This paper addresses the challenge of generating reliable content from large language models (LLMs) by focusing on improving the attribution task. The authors propose an Automatic Preference Optimization (APO) framework, which models the attribution task as preference learning. They create a curated dataset of 6,330 examples and develop an automated method to synthesize 95,263 pairs of preference data. Inspired by human citation processes, they introduce a progressive preference optimization method that leverages fine-grained information. Extensive experiments on three datasets (ASQA, StrategyQA, and ELI5) demonstrate that APO achieves state-of-the-art citation F1 scores and higher response quality. The contributions of the paper include applying preference learning to attribution tasks, establishing a full data collection pipeline, and proposing a progressive preference optimization method to address sparse reward issues. The authors also discuss limitations and ethical considerations, highlighting the need for further research to enhance LLM reliability and address real-world applications.This paper addresses the challenge of generating reliable content from large language models (LLMs) by focusing on improving the attribution task. The authors propose an Automatic Preference Optimization (APO) framework, which models the attribution task as preference learning. They create a curated dataset of 6,330 examples and develop an automated method to synthesize 95,263 pairs of preference data. Inspired by human citation processes, they introduce a progressive preference optimization method that leverages fine-grained information. Extensive experiments on three datasets (ASQA, StrategyQA, and ELI5) demonstrate that APO achieves state-of-the-art citation F1 scores and higher response quality. The contributions of the paper include applying preference learning to attribution tasks, establishing a full data collection pipeline, and proposing a progressive preference optimization method to address sparse reward issues. The authors also discuss limitations and ethical considerations, highlighting the need for further research to enhance LLM reliability and address real-world applications.

Improving Attributed Text Generation of Large Language Models via Preference Learning

27 Mar 2024 | Dongfang Li, Zetian Sun, Baotian Hu, Zhenyu Liu, Xinshuo Hu, Xuebo Liu, Min Zhang