Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

13 May 2024 | Zorik Gekhman, Gal Yona, Roei Aharoni, Matan Eyal, Amir Feder, Roi Reichart, Jonathan Herzig
This paper investigates the impact of fine-tuning large language models (LLMs) on new factual knowledge and its effect on the model's tendency to hallucinate. The study shows that when LLMs are fine-tuned on new knowledge, they struggle to integrate this knowledge and instead rely more on their pre-existing knowledge. This can lead to an increased tendency to hallucinate, as the model may generate responses that are not grounded in its pre-training knowledge. The study uses a controlled setup to evaluate the impact of new knowledge on LLM performance. It introduces a method called SliCK, which categorizes knowledge into four levels based on the model's confidence in the correctness of the answer. The results show that examples with new knowledge are learned more slowly than those consistent with the model's existing knowledge. As the model eventually learns these new examples, it becomes more prone to hallucinations. The study also finds that filtering out examples with new knowledge during fine-tuning reduces the risk of overfitting without compromising performance. Additionally, it shows that examples with lower certainty in the model's knowledge are essential for the model to correctly handle such examples during inference. The findings suggest that while LLMs primarily acquire factual knowledge through pre-training, fine-tuning can help them more efficiently use their existing knowledge. However, introducing new knowledge through fine-tuning can lead to unintended consequences, such as increased hallucination rates. The study highlights the importance of carefully managing the fine-tuning process to minimize these risks.This paper investigates the impact of fine-tuning large language models (LLMs) on new factual knowledge and its effect on the model's tendency to hallucinate. The study shows that when LLMs are fine-tuned on new knowledge, they struggle to integrate this knowledge and instead rely more on their pre-existing knowledge. This can lead to an increased tendency to hallucinate, as the model may generate responses that are not grounded in its pre-training knowledge. The study uses a controlled setup to evaluate the impact of new knowledge on LLM performance. It introduces a method called SliCK, which categorizes knowledge into four levels based on the model's confidence in the correctness of the answer. The results show that examples with new knowledge are learned more slowly than those consistent with the model's existing knowledge. As the model eventually learns these new examples, it becomes more prone to hallucinations. The study also finds that filtering out examples with new knowledge during fine-tuning reduces the risk of overfitting without compromising performance. Additionally, it shows that examples with lower certainty in the model's knowledge are essential for the model to correctly handle such examples during inference. The findings suggest that while LLMs primarily acquire factual knowledge through pre-training, fine-tuning can help them more efficiently use their existing knowledge. However, introducing new knowledge through fine-tuning can lead to unintended consequences, such as increased hallucination rates. The study highlights the importance of carefully managing the fine-tuning process to minimize these risks.
Reach us at info@study.space
[slides and audio] Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations%3F