LoFiT: Localized Fine-tuning on LLM Representations
**Authors:** Fangcong Yin, Xi Ye, Greg Durrett
**Institution:** The University of Texas at Austin
**Abstract:**
Recent work in interpretability has shown that large language models (LLMs) can be adapted for new tasks without learning by intervening on their representations. This paper introduces LoFiT, a localized fine-tuning method that identifies a subset of attention heads most important for a specific task and adds offset vectors to these heads' representations. LoFiT is effective, requiring only a sparse set of heads (3%) and limited training data. It outperforms representation intervention methods like Inference-time Intervention (ITI) and is comparable to parameter-efficient fine-tuning methods like LoRA, despite using fewer learned parameters. The study also highlights the importance of localization, showing that task-specific attention heads improve performance and that LoFiT generalizes well to out-of-domain tasks.
**Key Contributions:**
1. **LoFiT Method:** A localized fine-tuning method that selects a subset of attention heads and learns task-specific offset vectors.
2. **Effectiveness:** LoFiT achieves competitive performance on truthfulness and reasoning tasks, outperforming ITI and matching LoRA with fewer parameters.
3. **Localization Importance:** Task-specific attention heads are crucial for learning downstream tasks, as demonstrated by the effectiveness of LoFiT.
4. **Generalization:** LoFiT shows good out-of-domain generalization, maintaining or improving performance on unseen tasks.
**Experiments:**
- **TruthfulQA:** Evaluates truthfulness in responses.
- **CLUTRR:** Assesses multi-hop reasoning over family relationships.
- **MQuAKE:** Tests knowledge editing and counterfactual reasoning.
**Results:**
- LoFiT outperforms ITI and other PEFT methods across various settings and models.
- Task-specific attention heads consistently improve performance.
- LoFiT generalizes well to out-of-domain tasks, showing less overfitting compared to other methods.
**Conclusion:**
LoFiT is a powerful tool for fine-tuning LLMs, combining the benefits of localization and parameter efficiency. It demonstrates the potential of interpretability insights in improving LLM fine-tuning processes.LoFiT: Localized Fine-tuning on LLM Representations
**Authors:** Fangcong Yin, Xi Ye, Greg Durrett
**Institution:** The University of Texas at Austin
**Abstract:**
Recent work in interpretability has shown that large language models (LLMs) can be adapted for new tasks without learning by intervening on their representations. This paper introduces LoFiT, a localized fine-tuning method that identifies a subset of attention heads most important for a specific task and adds offset vectors to these heads' representations. LoFiT is effective, requiring only a sparse set of heads (3%) and limited training data. It outperforms representation intervention methods like Inference-time Intervention (ITI) and is comparable to parameter-efficient fine-tuning methods like LoRA, despite using fewer learned parameters. The study also highlights the importance of localization, showing that task-specific attention heads improve performance and that LoFiT generalizes well to out-of-domain tasks.
**Key Contributions:**
1. **LoFiT Method:** A localized fine-tuning method that selects a subset of attention heads and learns task-specific offset vectors.
2. **Effectiveness:** LoFiT achieves competitive performance on truthfulness and reasoning tasks, outperforming ITI and matching LoRA with fewer parameters.
3. **Localization Importance:** Task-specific attention heads are crucial for learning downstream tasks, as demonstrated by the effectiveness of LoFiT.
4. **Generalization:** LoFiT shows good out-of-domain generalization, maintaining or improving performance on unseen tasks.
**Experiments:**
- **TruthfulQA:** Evaluates truthfulness in responses.
- **CLUTRR:** Assesses multi-hop reasoning over family relationships.
- **MQuAKE:** Tests knowledge editing and counterfactual reasoning.
**Results:**
- LoFiT outperforms ITI and other PEFT methods across various settings and models.
- Task-specific attention heads consistently improve performance.
- LoFiT generalizes well to out-of-domain tasks, showing less overfitting compared to other methods.
**Conclusion:**
LoFiT is a powerful tool for fine-tuning LLMs, combining the benefits of localization and parameter efficiency. It demonstrates the potential of interpretability insights in improving LLM fine-tuning processes.