LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

May 28, 2024 | Rui Pan, Xiang Liu, Shizhe Diao, Renjie Pi, Jipeng Zhang, Chi Han, Tong Zhang
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning This paper introduces LISA, a memory-efficient optimization method for large language model (LLM) fine-tuning. LISA is inspired by the observation that LoRA, a parameter-efficient fine-tuning technique, exhibits skewed weight norms across different layers. LISA leverages this observation to sample layers based on their importance, enabling efficient training with less memory consumption than LoRA. The method randomly freezes most middle layers during optimization, focusing on updating essential layers. Experimental results show that LISA outperforms both LoRA and full parameter training in various downstream tasks, achieving significant improvements in performance metrics such as MT-Bench scores. LISA is particularly effective in instruction-following tasks and demonstrates better convergence behavior than LoRA. The method is scalable to large models, such as LLaMA-2-70B, and shows consistent performance improvements across different domains and model sizes. LISA's memory efficiency and performance benefits make it a promising alternative to LoRA for LLM fine-tuning.LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning This paper introduces LISA, a memory-efficient optimization method for large language model (LLM) fine-tuning. LISA is inspired by the observation that LoRA, a parameter-efficient fine-tuning technique, exhibits skewed weight norms across different layers. LISA leverages this observation to sample layers based on their importance, enabling efficient training with less memory consumption than LoRA. The method randomly freezes most middle layers during optimization, focusing on updating essential layers. Experimental results show that LISA outperforms both LoRA and full parameter training in various downstream tasks, achieving significant improvements in performance metrics such as MT-Bench scores. LISA is particularly effective in instruction-following tasks and demonstrates better convergence behavior than LoRA. The method is scalable to large models, such as LLaMA-2-70B, and shows consistent performance improvements across different domains and model sizes. LISA's memory efficiency and performance benefits make it a promising alternative to LoRA for LLM fine-tuning.
Reach us at info@study.space