Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

2024 | Yihua Zhang *1 Pingzhi Li *2 Junyuan Hong *3 Jiaxiang Li *4 Yimeng Zhang 1 Wenqing Zheng 3 Pin-Yu Chen 5 Jason D. Lee 6 Wotao Yin 7 Mingyi Hong 4 Zhangyang Wang 3 Sijia Liu 15 Tianlong Chen 28 9
This paper explores the application of zeroth-order (ZO) optimization in fine-tuning large language models (LLMs), addressing the significant memory overhead caused by back-propagation (BP) in first-order (FO) optimization methods like SGD and Adam. The authors propose a shift towards BP-free ZO optimization to reduce memory costs during LLM fine-tuning. They conduct a comprehensive benchmarking study across five LLM families, three task complexities, and five fine-tuning schemes, revealing previously overlooked optimization principles such as task alignment and the role of forward gradient methods. The study also introduces novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity, aiming to improve both accuracy and efficiency. The findings highlight the potential of ZO optimization for achieving memory-efficient LLM fine-tuning, with practical implications for on-device training and other resource-constrained environments.This paper explores the application of zeroth-order (ZO) optimization in fine-tuning large language models (LLMs), addressing the significant memory overhead caused by back-propagation (BP) in first-order (FO) optimization methods like SGD and Adam. The authors propose a shift towards BP-free ZO optimization to reduce memory costs during LLM fine-tuning. They conduct a comprehensive benchmarking study across five LLM families, three task complexities, and five fine-tuning schemes, revealing previously overlooked optimization principles such as task alignment and the role of forward gradient methods. The study also introduces novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity, aiming to improve both accuracy and efficiency. The findings highlight the potential of ZO optimization for achieving memory-efficient LLM fine-tuning, with practical implications for on-device training and other resource-constrained environments.
Reach us at info@study.space