2024 | Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jiménez, François Fleuret, Pascal Frossard
This paper introduces TALL-masks, a method for localizing task-specific information in multi-task vectors to improve model merging and compression. The authors show that task-specific information is preserved after merging, but task arithmetic fails to utilize it due to task interference. They propose TALL-masks, an algorithm that identifies task-specific parameters in the multi-task vector, allowing for efficient compression of individual checkpoints. By applying TALL-masks, they can retain over 99% of the original performance while significantly reducing storage requirements. They also propose Consensus Merging, an algorithm that eliminates selfish and catastrophic weights, improving the general performance of existing model merging approaches. Experiments on vision and NLP benchmarks show that Consensus Merging consistently improves existing methods, achieving state-of-the-art results. The proposed compression scheme reduces storage from 57Gb to 8.2Gb while retaining 99.7% of original performance. The authors also demonstrate that their method can be used for model merging, where it consistently improves prior methods. The results show that their method is robust to an increase in the number of tasks, maintaining high performance across various settings. The paper also discusses the impact of weight-pruning thresholds on performance and shows that their method achieves a favorable trade-off between performance and storage. Overall, the authors' work provides a practical solution for compressing foundation models and improving model merging.This paper introduces TALL-masks, a method for localizing task-specific information in multi-task vectors to improve model merging and compression. The authors show that task-specific information is preserved after merging, but task arithmetic fails to utilize it due to task interference. They propose TALL-masks, an algorithm that identifies task-specific parameters in the multi-task vector, allowing for efficient compression of individual checkpoints. By applying TALL-masks, they can retain over 99% of the original performance while significantly reducing storage requirements. They also propose Consensus Merging, an algorithm that eliminates selfish and catastrophic weights, improving the general performance of existing model merging approaches. Experiments on vision and NLP benchmarks show that Consensus Merging consistently improves existing methods, achieving state-of-the-art results. The proposed compression scheme reduces storage from 57Gb to 8.2Gb while retaining 99.7% of original performance. The authors also demonstrate that their method can be used for model merging, where it consistently improves prior methods. The results show that their method is robust to an increase in the number of tasks, maintaining high performance across various settings. The paper also discusses the impact of weight-pruning thresholds on performance and shows that their method achieves a favorable trade-off between performance and storage. Overall, the authors' work provides a practical solution for compressing foundation models and improving model merging.