[slides and audio] Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations%3F

This paper investigates the impact of fine-tuning large language models (LLMs) on new factual knowledge on their tendency to hallucinate. The authors propose SliCK, a method to categorize facts into four knowledge categories based on the model's confidence in generating correct answers. They conduct a controlled study using closed-book question answering tasks, varying the proportion of fine-tuning examples that introduce new knowledge. The results show that LLMs struggle to acquire new factual knowledge through fine-tuning, as new knowledge examples are learned significantly slower than known examples. However, as the model eventually learns new knowledge, it becomes more prone to hallucinations. The study also highlights the importance of early-stopping to mitigate overfitting and the potential benefits of filtering out new knowledge examples to reduce hallucinations. The findings suggest that LLMs primarily acquire knowledge through pre-training and that fine-tuning may be more effective for enhancing the utilization of pre-existing knowledge.This paper investigates the impact of fine-tuning large language models (LLMs) on new factual knowledge on their tendency to hallucinate. The authors propose SliCK, a method to categorize facts into four knowledge categories based on the model's confidence in generating correct answers. They conduct a controlled study using closed-book question answering tasks, varying the proportion of fine-tuning examples that introduce new knowledge. The results show that LLMs struggle to acquire new factual knowledge through fine-tuning, as new knowledge examples are learned significantly slower than known examples. However, as the model eventually learns new knowledge, it becomes more prone to hallucinations. The study also highlights the importance of early-stopping to mitigate overfitting and the potential benefits of filtering out new knowledge examples to reduce hallucinations. The findings suggest that LLMs primarily acquire knowledge through pre-training and that fine-tuning may be more effective for enhancing the utilization of pre-existing knowledge.

Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

13 May 2024 | Zorik Gekhman, Gal Yona, Roee Aharoni, Matan Eyal, Amir Feder, Roi Reichart, Jonathan Herzig