[slides] FeDeRA%3A Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition

FeDeRA: Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition Pre-trained language models (PLMs) face significant challenges due to privacy concerns in centralized training methods. Federated learning (FL) addresses these issues by training models collaboratively across decentralized clients without sharing raw data. However, fine-tuning PLMs in FL is challenging due to the large number of parameters, leading to substantial communication overheads and computational demands. Parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA), reduce the number of parameters to be updated, improving efficiency. However, PEFT methods may degrade performance when data across clients are non-i.i.d., as shown by experimental results. To address this issue, the paper proposes FeDeRA, which extends LoRA to the FL setting by initializing low-rank matrices using singular value decomposition (SVD) on pre-trained weight matrices. FeDeRA initializes these matrices more effectively than random sampling or zeros, leading to better performance and faster convergence. Extensive experiments across various tasks and datasets demonstrate that FeDeRA outperforms existing PEFT baselines and is comparable to or better than full-parameter fine-tuning (FFT) in terms of task performance. FeDeRA also reduces training time by over 90% compared to FFT, while maintaining consistent task performance. Additionally, FeDeRA shows robustness against data heterogeneity, maintaining stable performance even as data heterogeneity increases. The paper discusses related works on FL with non-i.i.d. data and PEFT methods, and provides a detailed methodology for FeDeRA, including its implementation and evaluation. The results highlight FeDeRA's superior performance and efficiency in federated learning settings.FeDeRA: Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition Pre-trained language models (PLMs) face significant challenges due to privacy concerns in centralized training methods. Federated learning (FL) addresses these issues by training models collaboratively across decentralized clients without sharing raw data. However, fine-tuning PLMs in FL is challenging due to the large number of parameters, leading to substantial communication overheads and computational demands. Parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA), reduce the number of parameters to be updated, improving efficiency. However, PEFT methods may degrade performance when data across clients are non-i.i.d., as shown by experimental results. To address this issue, the paper proposes FeDeRA, which extends LoRA to the FL setting by initializing low-rank matrices using singular value decomposition (SVD) on pre-trained weight matrices. FeDeRA initializes these matrices more effectively than random sampling or zeros, leading to better performance and faster convergence. Extensive experiments across various tasks and datasets demonstrate that FeDeRA outperforms existing PEFT baselines and is comparable to or better than full-parameter fine-tuning (FFT) in terms of task performance. FeDeRA also reduces training time by over 90% compared to FFT, while maintaining consistent task performance. Additionally, FeDeRA shows robustness against data heterogeneity, maintaining stable performance even as data heterogeneity increases. The paper discusses related works on FL with non-i.i.d. data and PEFT methods, and provides a detailed methodology for FeDeRA, including its implementation and evaluation. The results highlight FeDeRA's superior performance and efficiency in federated learning settings.

FeDeRA: Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition

25 May 2024 | Yuxuan Yan, Shunpu Tang, Zhiguo Shi, Qianqian Yang