The paper introduces FedPipe, an automated federated pipeline designed to efficiently fine-tune large language models (LLMs) using parameter-efficient fine-tuning (PEFT) techniques. FedPipe addresses the challenges of computational and communication overhead in LLM fine-tuning, especially in the context of federated learning (FL). The key contributions of FedPipe include:
1. **Identification of Important Weights**: FedPipe uses a combination of singular value decomposition (SVD) and sensitivity-based importance scoring to identify the most critical weights for fine-tuning, ensuring that only these weights are updated during training.
2. **Heterogeneous Adapter Configuration**: FedPipe dynamically adjusts the batch size and rank of LoRA adapters based on the computing resources available at each edge server, optimizing the training efficiency and reducing the under-training rate.
3. **Quantization for Memory Efficiency**: FedPipe employs NormalFloat (NF) quantization to reduce the memory footprint of pre-trained models, allowing for more efficient training on edge servers with limited GPU memory.
4. **Efficient Model Aggregation**: FedPipe designs a tailored aggregation method to merge the rank decomposition matrices from different edge servers, ensuring that the aggregated model remains accurate and efficient.
Experiments on the Bert and GPT-2 models demonstrate that FedPipe outperforms existing benchmarks in terms of converged accuracy, number of trainable parameters, and convergence rate, both under homogeneous and heterogeneous computing environments. The paper also includes ablation studies to validate the effectiveness of each component of FedPipe.The paper introduces FedPipe, an automated federated pipeline designed to efficiently fine-tune large language models (LLMs) using parameter-efficient fine-tuning (PEFT) techniques. FedPipe addresses the challenges of computational and communication overhead in LLM fine-tuning, especially in the context of federated learning (FL). The key contributions of FedPipe include:
1. **Identification of Important Weights**: FedPipe uses a combination of singular value decomposition (SVD) and sensitivity-based importance scoring to identify the most critical weights for fine-tuning, ensuring that only these weights are updated during training.
2. **Heterogeneous Adapter Configuration**: FedPipe dynamically adjusts the batch size and rank of LoRA adapters based on the computing resources available at each edge server, optimizing the training efficiency and reducing the under-training rate.
3. **Quantization for Memory Efficiency**: FedPipe employs NormalFloat (NF) quantization to reduce the memory footprint of pre-trained models, allowing for more efficient training on edge servers with limited GPU memory.
4. **Efficient Model Aggregation**: FedPipe designs a tailored aggregation method to merge the rank decomposition matrices from different edge servers, ensuring that the aggregated model remains accurate and efficient.
Experiments on the Bert and GPT-2 models demonstrate that FedPipe outperforms existing benchmarks in terms of converged accuracy, number of trainable parameters, and convergence rate, both under homogeneous and heterogeneous computing environments. The paper also includes ablation studies to validate the effectiveness of each component of FedPipe.