Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

2024 | Yihua Zhang, Pingzhi Li, Junyuan Hong, Jiaxiang Li, Yimeng Zhang, Wenqing Zheng, Pin-Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong, Zhangyang Wang, Sijia Liu, Tianlong Chen
This paper explores the application of zeroth-order (ZO) optimization in fine-tuning large language models (LLMs), addressing the significant memory overhead caused by back-propagation (BP) in first-order (FO) optimization methods like SGD and Adam. The authors propose a shift towards BP-free ZO optimization to reduce memory costs during LLM fine-tuning. They conduct a comprehensive benchmarking study across five LLM families, three task complexities, and five fine-tuning schemes, revealing previously overlooked optimization principles such as task alignment and the role of forward gradient methods. The study also introduces novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity, aiming to improve both accuracy and efficiency. The findings highlight the potential of ZO optimization for achieving memory-efficient LLM fine-tuning, with practical implications for on-device training and other resource-constrained environments.This paper explores the application of zeroth-order (ZO) optimization in fine-tuning large language models (LLMs), addressing the significant memory overhead caused by back-propagation (BP) in first-order (FO) optimization methods like SGD and Adam. The authors propose a shift towards BP-free ZO optimization to reduce memory costs during LLM fine-tuning. They conduct a comprehensive benchmarking study across five LLM families, three task complexities, and five fine-tuning schemes, revealing previously overlooked optimization principles such as task alignment and the role of forward gradient methods. The study also introduces novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity, aiming to improve both accuracy and efficiency. The findings highlight the potential of ZO optimization for achieving memory-efficient LLM fine-tuning, with practical implications for on-device training and other resource-constrained environments.
Reach us at info@study.space