[slides and audio] LoFiT%3A Localized Fine-tuning on LLM Representations

LoFiT: Localized Fine-tuning on LLM Representations **Authors:** Fangcong Yin, Xi Ye, Greg Durrett **Institution:** The University of Texas at Austin **Abstract:** Recent work in interpretability has shown that large language models (LLMs) can be adapted for new tasks without learning by intervening on their representations. This paper introduces LoFiT, a localized fine-tuning method that identifies a subset of attention heads most important for a specific task and adds offset vectors to these heads' representations. LoFiT is effective, requiring only a sparse set of heads (3%) and limited training data. It outperforms representation intervention methods like Inference-time Intervention (ITI) and is comparable to parameter-efficient fine-tuning methods like LoRA, despite using fewer learned parameters. The study also highlights the importance of localization, showing that task-specific attention heads improve performance and that LoFiT generalizes well to out-of-domain tasks. **Key Contributions:** 1. **LoFiT Method:** A localized fine-tuning method that selects a subset of attention heads and learns task-specific offset vectors. 2. **Effectiveness:** LoFiT achieves competitive performance on truthfulness and reasoning tasks, outperforming ITI and matching LoRA with fewer parameters. 3. **Localization Importance:** Task-specific attention heads are crucial for learning downstream tasks, as demonstrated by the effectiveness of LoFiT. 4. **Generalization:** LoFiT shows good out-of-domain generalization, maintaining or improving performance on unseen tasks. **Experiments:** - **TruthfulQA:** Evaluates truthfulness in responses. - **CLUTRR:** Assesses multi-hop reasoning over family relationships. - **MQuAKE:** Tests knowledge editing and counterfactual reasoning. **Results:** - LoFiT outperforms ITI and other PEFT methods across various settings and models. - Task-specific attention heads consistently improve performance. - LoFiT generalizes well to out-of-domain tasks, showing less overfitting compared to other methods. **Conclusion:** LoFiT is a powerful tool for fine-tuning LLMs, combining the benefits of localization and parameter efficiency. It demonstrates the potential of interpretability insights in improving LLM fine-tuning processes.LoFiT: Localized Fine-tuning on LLM Representations **Authors:** Fangcong Yin, Xi Ye, Greg Durrett **Institution:** The University of Texas at Austin **Abstract:** Recent work in interpretability has shown that large language models (LLMs) can be adapted for new tasks without learning by intervening on their representations. This paper introduces LoFiT, a localized fine-tuning method that identifies a subset of attention heads most important for a specific task and adds offset vectors to these heads' representations. LoFiT is effective, requiring only a sparse set of heads (3%) and limited training data. It outperforms representation intervention methods like Inference-time Intervention (ITI) and is comparable to parameter-efficient fine-tuning methods like LoRA, despite using fewer learned parameters. The study also highlights the importance of localization, showing that task-specific attention heads improve performance and that LoFiT generalizes well to out-of-domain tasks. **Key Contributions:** 1. **LoFiT Method:** A localized fine-tuning method that selects a subset of attention heads and learns task-specific offset vectors. 2. **Effectiveness:** LoFiT achieves competitive performance on truthfulness and reasoning tasks, outperforming ITI and matching LoRA with fewer parameters. 3. **Localization Importance:** Task-specific attention heads are crucial for learning downstream tasks, as demonstrated by the effectiveness of LoFiT. 4. **Generalization:** LoFiT shows good out-of-domain generalization, maintaining or improving performance on unseen tasks. **Experiments:** - **TruthfulQA:** Evaluates truthfulness in responses. - **CLUTRR:** Assesses multi-hop reasoning over family relationships. - **MQuAKE:** Tests knowledge editing and counterfactual reasoning. **Results:** - LoFiT outperforms ITI and other PEFT methods across various settings and models. - Task-specific attention heads consistently improve performance. - LoFiT generalizes well to out-of-domain tasks, showing less overfitting compared to other methods. **Conclusion:** LoFiT is a powerful tool for fine-tuning LLMs, combining the benefits of localization and parameter efficiency. It demonstrates the potential of interpretability insights in improving LLM fine-tuning processes.

LoFiT: Localized Fine-tuning on LLM Representations

3 Jun 2024 | Fangcong Yin, Xi Ye, Greg Durrett