DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

17 Jun 2024 | Pala Tej Deep, Rishabh Bhardwaj, Soujanya Poria
DELLA-Merging is a novel model merging technique that reduces interference among models by using magnitude-based sampling. The method, called DELLA-Merging, employs MAGPRUNE, a novel pruning technique that ranks parameters by magnitude and assigns higher dropout probabilities to lower magnitude parameters. This approach is followed by sign-based selection of delta parameters and merging of selected parameters. DELLA-Merging outperforms baseline methods such as DARE and TIES on three out of four merge settings, achieving an average improvement of 2.4 points over baseline methods using delta parameter pruning, and 11.1 points over the no-pruning baseline. The method also demonstrates the importance of scaling unpruned delta parameters, which improves performance by 7.6 points on the Math+Code model. Theoretical analysis shows that DELLA-Merging preserves model performance by approximating original embeddings through rescaling. Experimental results show that DELLA-Merging achieves higher performance on benchmark tasks such as AlpacaEval, GSM8K, and MBPP compared to baselines. The method is effective in reducing interference during model merging and maintaining task-specific performance. DELLA-Merging is a more generic pruning approach that encompasses methods such as NODROP, DARE, and TIES. The method is evaluated on three different expert models (LM, Math, Code) and their corresponding benchmark datasets. The results show that DELLA-Merging achieves the highest average score for five out of eight merges and the second-best score for two out of three remaining merges. The method is also effective in reducing interference during merging and maintaining task-specific performance. DELLA-Merging is a more generic pruning approach that encompasses methods such as NODROP, DARE, and TIES. The method is evaluated on three different expert models (LM, Math, Code) and their corresponding benchmark datasets. The results show that DELLA-Merging achieves the highest average score for five out of eight merges and the second-best score for two out of three remaining merges.DELLA-Merging is a novel model merging technique that reduces interference among models by using magnitude-based sampling. The method, called DELLA-Merging, employs MAGPRUNE, a novel pruning technique that ranks parameters by magnitude and assigns higher dropout probabilities to lower magnitude parameters. This approach is followed by sign-based selection of delta parameters and merging of selected parameters. DELLA-Merging outperforms baseline methods such as DARE and TIES on three out of four merge settings, achieving an average improvement of 2.4 points over baseline methods using delta parameter pruning, and 11.1 points over the no-pruning baseline. The method also demonstrates the importance of scaling unpruned delta parameters, which improves performance by 7.6 points on the Math+Code model. Theoretical analysis shows that DELLA-Merging preserves model performance by approximating original embeddings through rescaling. Experimental results show that DELLA-Merging achieves higher performance on benchmark tasks such as AlpacaEval, GSM8K, and MBPP compared to baselines. The method is effective in reducing interference during model merging and maintaining task-specific performance. DELLA-Merging is a more generic pruning approach that encompasses methods such as NODROP, DARE, and TIES. The method is evaluated on three different expert models (LM, Math, Code) and their corresponding benchmark datasets. The results show that DELLA-Merging achieves the highest average score for five out of eight merges and the second-best score for two out of three remaining merges. The method is also effective in reducing interference during merging and maintaining task-specific performance. DELLA-Merging is a more generic pruning approach that encompasses methods such as NODROP, DARE, and TIES. The method is evaluated on three different expert models (LM, Math, Code) and their corresponding benchmark datasets. The results show that DELLA-Merging achieves the highest average score for five out of eight merges and the second-best score for two out of three remaining merges.
Reach us at info@study.space