Localizing Task Information for Improved Model Merging and Compression

Localizing Task Information for Improved Model Merging and Compression

2024 | Ke Wang * 1 Nikolaos Dimitriadis * 1 Guillermo Ortiz-Jiménez 2 3 François Fleuret 4 Pascal Frossard 1
This paper addresses the issue of performance degradation in model merging and task arithmetic, which are promising approaches to merge multiple single-task checkpoints into a multi-task model. The authors propose TALL-masks, a method to identify task-specific information in the multi-task vector, which retains >99% of single-task accuracy. They also introduce Consensus Merging, an algorithm that eliminates *selfish* and *catastrophic* weights, improving the performance of existing model merging methods. Experiments on vision and NLP benchmarks show that Consensus Merging consistently outperforms prior methods, reducing storage from 57GB to 8.2GB while retaining 99.7% of original performance. The paper provides a novel perspective on task interference and proposes efficient solutions to leverage task-specific information effectively.This paper addresses the issue of performance degradation in model merging and task arithmetic, which are promising approaches to merge multiple single-task checkpoints into a multi-task model. The authors propose TALL-masks, a method to identify task-specific information in the multi-task vector, which retains >99% of single-task accuracy. They also introduce Consensus Merging, an algorithm that eliminates *selfish* and *catastrophic* weights, improving the performance of existing model merging methods. Experiments on vision and NLP benchmarks show that Consensus Merging consistently outperforms prior methods, reducing storage from 57GB to 8.2GB while retaining 99.7% of original performance. The paper provides a novel perspective on task interference and proposes efficient solutions to leverage task-specific information effectively.
Reach us at info@study.space