Understanding FLAME%3A Factuality-Aware Alignment for Large Language Models

The paper "FLAME+: Factuality-Aware Alignment for Large Language Models" addresses the issue of hallucination in large language models (LLMs) during the alignment process, which is crucial for enhancing their factual accuracy. The authors identify that the conventional alignment process, consisting of supervised fine-tuning (SFT) and reinforcement learning (RL), can lead to more false facts due to the introduction of unfamiliar knowledge and the preference for detailed responses. To mitigate this, they propose factuality-aware alignment (FLAME+), which includes factuality-aware SFT and factuality-aware RL through direct preference optimization. The key contributions are: 1. **Factuality-Aware SFT**: This approach elicits knowledge from the pre-trained LLM itself by generating responses with few-shot demonstrations, avoiding the introduction of unknown information. 2. **Factuality-Aware RL**: This approach creates additional preference pairs focused on factuality for fact-based instructions, combining them with standard preference pairs for instruction following during Direct Preference Optimization (DPO). Experiments on the Alpaca Eval and Biography datasets show that FLAME+ significantly improves the factual accuracy of LLMs while maintaining or even enhancing their instruction-following capability. The authors also conduct a pilot study to validate their findings and provide ablation studies to demonstrate the effectiveness of their proposed methods.The paper "FLAME+: Factuality-Aware Alignment for Large Language Models" addresses the issue of hallucination in large language models (LLMs) during the alignment process, which is crucial for enhancing their factual accuracy. The authors identify that the conventional alignment process, consisting of supervised fine-tuning (SFT) and reinforcement learning (RL), can lead to more false facts due to the introduction of unfamiliar knowledge and the preference for detailed responses. To mitigate this, they propose factuality-aware alignment (FLAME+), which includes factuality-aware SFT and factuality-aware RL through direct preference optimization. The key contributions are: 1. **Factuality-Aware SFT**: This approach elicits knowledge from the pre-trained LLM itself by generating responses with few-shot demonstrations, avoiding the introduction of unknown information. 2. **Factuality-Aware RL**: This approach creates additional preference pairs focused on factuality for fact-based instructions, combining them with standard preference pairs for instruction following during Direct Preference Optimization (DPO). Experiments on the Alpaca Eval and Biography datasets show that FLAME+ significantly improves the factual accuracy of LLMs while maintaining or even enhancing their instruction-following capability. The authors also conduct a pilot study to validate their findings and provide ablation studies to demonstrate the effectiveness of their proposed methods.

FLAME*: Factuality-Aware Alignment for Large Language Models

2 May 2024 | Sheng-Chieh Lin1*, Luyu Gao2, Barlas Oguz3, Wenhan Xiong3, Jimmy Lin1, Wen-tau Yih3, and Xilun Chen3†