2 Feb 2024 | Alan Ansell, Ivan Vulić, Hannah Sterz, Anna Korhonen, Edoardo M. Ponti
This paper introduces SpIEL, a novel sparse fine-tuning method for large language models (LLMs) that achieves high performance with low memory usage. SpIEL iteratively updates active parameter deltas, prunes obsolete indices, and regrows new indices based on gradient changes or approximate momenta. It is compatible with quantization and efficient optimizers like SM3, enabling scaling to larger models. Experiments show that SpIEL outperforms popular parameter-efficient fine-tuning methods like LoRA in performance and is comparable in runtime. It is also effective in instruction-tuning tasks on diverse datasets, including Flan v2, GPT4-Alpaca, and Tulu v2. SpIEL-AG and SpIEL-MA variants are presented, with SpIEL-AG being more memory-efficient and SpIEL-MA offering a trade-off between performance and memory. The method is shown to be effective in both parameter-efficient and memory-efficient settings, and it scales well to large LLMs. The code for SpIEL and related experiments is available at the provided GitHub links.This paper introduces SpIEL, a novel sparse fine-tuning method for large language models (LLMs) that achieves high performance with low memory usage. SpIEL iteratively updates active parameter deltas, prunes obsolete indices, and regrows new indices based on gradient changes or approximate momenta. It is compatible with quantization and efficient optimizers like SM3, enabling scaling to larger models. Experiments show that SpIEL outperforms popular parameter-efficient fine-tuning methods like LoRA in performance and is comparable in runtime. It is also effective in instruction-tuning tasks on diverse datasets, including Flan v2, GPT4-Alpaca, and Tulu v2. SpIEL-AG and SpIEL-MA variants are presented, with SpIEL-AG being more memory-efficient and SpIEL-MA offering a trade-off between performance and memory. The method is shown to be effective in both parameter-efficient and memory-efficient settings, and it scales well to large LLMs. The code for SpIEL and related experiments is available at the provided GitHub links.