A Closer Look at the Limitations of Instruction Tuning

A Closer Look at the Limitations of Instruction Tuning

2024 | Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Ramaneswaran S, Deepali Aneja, Zeyu Jin, Ramani Duraiswami, Dinesh Manocha
This paper investigates the limitations of Instruction Tuning (IT) in enhancing the knowledge and capabilities of large language models (LLMs). While IT has become a popular method for aligning LLMs with conversational tasks, the study reveals several critical shortcomings. First, IT does not effectively enhance knowledge or skills in LLMs. LoRA fine-tuning only learns response initiation and style tokens, while full-parameter fine-tuning leads to knowledge degradation. Second, copying response patterns from IT datasets derived from knowledgeable sources can reduce response quality. Third, full-parameter fine-tuning increases hallucination by inaccurately borrowing tokens from similar instances in the IT dataset. Fourth, popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. The study shows that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn new knowledge from IT on open-source datasets. The paper explores the transformation of LLMs through IT, revealing that LFT (LoRA fine-tuning) aligns closely with pre-trained knowledge, while SFT (full-parameter fine-tuning) leads to significant deviations. LFT primarily learns response initiation, with most of the answer coming from pre-trained knowledge. SFT, on the other hand, leads to a more substantial and uniform distribution shift, suggesting new knowledge acquisition. However, responses based on pre-trained knowledge consistently outperform those based on newly learned information from SFT. Pattern copying, often associated with SFT, can hurt response quality by leading models to inaccurately include tokens from the IT dataset. This results in hallucinations, where tokens are borrowed from similar instances in the IT dataset. The study proposes simplifying responses in the IT dataset to mitigate hallucinations, showing that simplified responses reduce hallucinations while maintaining factual accuracy. Causal analysis of hallucinations reveals that SFT increases hallucinations by making models prone to incorrectly borrow tokens from the IT dataset. These tokens often originate from instances with similar concepts. The study also finds that methods to improve IT, such as dataset filtering and NEFTune, do not significantly enhance performance compared to LFT. The paper concludes that IT has limitations, and pre-trained knowledge remains more effective than newly learned knowledge. Future work should focus on developing more robust and reliable conversational agents that can effectively utilize pre-trained knowledge without relying on IT. The findings highlight the importance of understanding the fundamental workings of LLMs and the need for further research into the limitations of IT.This paper investigates the limitations of Instruction Tuning (IT) in enhancing the knowledge and capabilities of large language models (LLMs). While IT has become a popular method for aligning LLMs with conversational tasks, the study reveals several critical shortcomings. First, IT does not effectively enhance knowledge or skills in LLMs. LoRA fine-tuning only learns response initiation and style tokens, while full-parameter fine-tuning leads to knowledge degradation. Second, copying response patterns from IT datasets derived from knowledgeable sources can reduce response quality. Third, full-parameter fine-tuning increases hallucination by inaccurately borrowing tokens from similar instances in the IT dataset. Fourth, popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. The study shows that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn new knowledge from IT on open-source datasets. The paper explores the transformation of LLMs through IT, revealing that LFT (LoRA fine-tuning) aligns closely with pre-trained knowledge, while SFT (full-parameter fine-tuning) leads to significant deviations. LFT primarily learns response initiation, with most of the answer coming from pre-trained knowledge. SFT, on the other hand, leads to a more substantial and uniform distribution shift, suggesting new knowledge acquisition. However, responses based on pre-trained knowledge consistently outperform those based on newly learned information from SFT. Pattern copying, often associated with SFT, can hurt response quality by leading models to inaccurately include tokens from the IT dataset. This results in hallucinations, where tokens are borrowed from similar instances in the IT dataset. The study proposes simplifying responses in the IT dataset to mitigate hallucinations, showing that simplified responses reduce hallucinations while maintaining factual accuracy. Causal analysis of hallucinations reveals that SFT increases hallucinations by making models prone to incorrectly borrow tokens from the IT dataset. These tokens often originate from instances with similar concepts. The study also finds that methods to improve IT, such as dataset filtering and NEFTune, do not significantly enhance performance compared to LFT. The paper concludes that IT has limitations, and pre-trained knowledge remains more effective than newly learned knowledge. Future work should focus on developing more robust and reliable conversational agents that can effectively utilize pre-trained knowledge without relying on IT. The findings highlight the importance of understanding the fundamental workings of LLMs and the need for further research into the limitations of IT.
Reach us at info@study.space