[slides and audio] Scaling Sparse Fine-Tuning to Large Language Models

This paper addresses the challenge of fine-tuning large language models (LLMs) using parameter-efficient sparse fine-tuning (PEFT) methods, which have shown promise in terms of performance but suffer from high memory requirements. The authors propose SpIEL (Sparse Iterative Efficient Learning), a novel sparse fine-tuning method that maintains an array of parameter indices and their deltas relative to the pretrained values. SpIEL iterates over updating active deltas, pruning indices based on the magnitude change of their deltas, and regrowing indices. Two criteria for regrowth are explored: accumulated gradients and approximate momenta estimated using the efficient SM3 optimizer. The authors demonstrate that SpIEL outperforms popular PEFT methods like LoRA in terms of performance and is comparable in terms of runtime. They also show that SpIEL is compatible with quantization and efficient optimizers, making it suitable for scaling to larger model sizes. The code for SpIEL is released at <https://github.com/AlanAnsell/peft>.This paper addresses the challenge of fine-tuning large language models (LLMs) using parameter-efficient sparse fine-tuning (PEFT) methods, which have shown promise in terms of performance but suffer from high memory requirements. The authors propose SpIEL (Sparse Iterative Efficient Learning), a novel sparse fine-tuning method that maintains an array of parameter indices and their deltas relative to the pretrained values. SpIEL iterates over updating active deltas, pruning indices based on the magnitude change of their deltas, and regrowing indices. Two criteria for regrowth are explored: accumulated gradients and approximate momenta estimated using the efficient SM3 optimizer. The authors demonstrate that SpIEL outperforms popular PEFT methods like LoRA in terms of performance and is comparable in terms of runtime. They also show that SpIEL is compatible with quantization and efficient optimizers, making it suitable for scaling to larger model sizes. The code for SpIEL is released at <https://github.com/AlanAnsell/peft>.

Scaling Sparse Fine-Tuning to Large Language Models

2 Feb 2024 | Alan Ansell, Ivan Vulić, Hannah Sterz, Anna Korhonen, Edoardo M. Ponti