Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

24 Feb 2024 | Yong Liu, Zirui Zhu, Chaoyu Gong, Minhao Cheng, Cho-Jui Hsieh, Yang You
This paper introduces Sparse MeZO, a novel memory-efficient zeroth-order optimization method for fine-tuning large language models (LLMs). Inspired by Parameter-Efficient Fine-Tuning (PEFT), Sparse MeZO applies zeroth-order optimization only to a carefully selected subset of parameters, significantly improving performance and convergence speed while maintaining low memory usage. The proposed method is evaluated on various tasks using LLaMA and OPT models, demonstrating a 9% absolute accuracy improvement and a 3.5x speedup over vanilla MeZO on the RTE task. The paper also provides a theoretical analysis of the convergence rate of Sparse-MeZO and discusses the effectiveness of sparse masks in improving performance. Additionally, it introduces an efficient implementation that calculates the sparse mask during the forward pass, further reducing memory consumption. Experimental results show that Sparse-MeZO consistently outperforms MeZO in terms of both accuracy and convergence speed, making it a promising approach for fine-tuning LLMs with limited computational resources.This paper introduces Sparse MeZO, a novel memory-efficient zeroth-order optimization method for fine-tuning large language models (LLMs). Inspired by Parameter-Efficient Fine-Tuning (PEFT), Sparse MeZO applies zeroth-order optimization only to a carefully selected subset of parameters, significantly improving performance and convergence speed while maintaining low memory usage. The proposed method is evaluated on various tasks using LLaMA and OPT models, demonstrating a 9% absolute accuracy improvement and a 3.5x speedup over vanilla MeZO on the RTE task. The paper also provides a theoretical analysis of the convergence rate of Sparse-MeZO and discusses the effectiveness of sparse masks in improving performance. Additionally, it introduces an efficient implementation that calculates the sparse mask during the forward pass, further reducing memory consumption. Experimental results show that Sparse-MeZO consistently outperforms MeZO in terms of both accuracy and convergence speed, making it a promising approach for fine-tuning LLMs with limited computational resources.
Reach us at info@study.space