This paper explores the limitations of Instruction Tuning (IT), a method used to transform large language models (LLMs) into open-domain conversational agents. While IT has achieved significant success, its shortcomings are underexplored. The study reveals several limitations:
1. **Knowledge Enhancement**: IT does not enhance knowledge or skills in LLMs. LoRA fine-tuning only learns response initiation and style tokens, while full-parameter fine-tuning leads to knowledge degradation.
2. **Pattern Copying**: Copying response patterns from IT datasets derived from knowledgeable sources often leads to a decline in response quality.
3. **Hallucination**: Full-parameter fine-tuning increases hallucination by inaccurately borrowing tokens from conceptually similar instances in the IT dataset.
4. **Improvement Methods**: Popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model.
The findings suggest that responses generated solely from pre-trained knowledge consistently outperform those learned from IT on open-source datasets. The paper aims to inspire future research in addressing these challenges.This paper explores the limitations of Instruction Tuning (IT), a method used to transform large language models (LLMs) into open-domain conversational agents. While IT has achieved significant success, its shortcomings are underexplored. The study reveals several limitations:
1. **Knowledge Enhancement**: IT does not enhance knowledge or skills in LLMs. LoRA fine-tuning only learns response initiation and style tokens, while full-parameter fine-tuning leads to knowledge degradation.
2. **Pattern Copying**: Copying response patterns from IT datasets derived from knowledgeable sources often leads to a decline in response quality.
3. **Hallucination**: Full-parameter fine-tuning increases hallucination by inaccurately borrowing tokens from conceptually similar instances in the IT dataset.
4. **Improvement Methods**: Popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model.
The findings suggest that responses generated solely from pre-trained knowledge consistently outperform those learned from IT on open-source datasets. The paper aims to inspire future research in addressing these challenges.