MAGMAX: Leveraging Model Merging for Seamless Continual Learning

MAGMAX: Leveraging Model Merging for Seamless Continual Learning

29 Jul 2024 | Daniel Marczak, Bartłomiej Twardowski, Tomasz Trzciński, and Sebastian Cygert
MAGMAX is a novel continual learning approach that leverages model merging to enable large pre-trained models to continuously learn from new data without forgetting previously acquired knowledge. Unlike traditional continual learning methods that focus on reducing forgetting during task training, MAGMAX combines sequential fine-tuning with maximum magnitude weight selection for effective knowledge integration across tasks. The paper presents an extensive evaluation of model merging techniques, revealing that simple approaches like weight averaging and random weight selection surprisingly perform well in various continual learning contexts. The proposed MAGMAX method enables continual learning of large pre-trained models for successive tasks by merging task vectors using maximum magnitude selection. The evaluation demonstrates that MAGMAX outperforms existing methods in various scenarios, including class- and domain-incremental learning settings. The method achieves state-of-the-art results on multiple continual learning benchmarks. The paper also highlights the broader implications of the findings, showing that maximum magnitude selection in model merging can improve existing continual learning methods and that sequential fine-tuning enhances the performance of models combined using various merging techniques. The results indicate that sequential fine-tuning reduces sign conflicts between task-specific models, which is a major source of interference when merging models. The paper also investigates the effectiveness of parameter selection strategies and the contribution of task vectors. The findings suggest that model merging is a promising technique for consolidating knowledge after training rather than during training. The method is evaluated on various datasets, including CIFAR100, ImageNet-R, and DomainNet, and shows significant improvements in performance compared to existing methods. The results demonstrate that MAGMAX is effective in both class-incremental and domain-incremental learning scenarios. The paper also discusses the impact of different merging strategies and the importance of sequential fine-tuning in reducing forgetting. The findings highlight the potential of model merging as a viable solution to the challenges of continual learning.MAGMAX is a novel continual learning approach that leverages model merging to enable large pre-trained models to continuously learn from new data without forgetting previously acquired knowledge. Unlike traditional continual learning methods that focus on reducing forgetting during task training, MAGMAX combines sequential fine-tuning with maximum magnitude weight selection for effective knowledge integration across tasks. The paper presents an extensive evaluation of model merging techniques, revealing that simple approaches like weight averaging and random weight selection surprisingly perform well in various continual learning contexts. The proposed MAGMAX method enables continual learning of large pre-trained models for successive tasks by merging task vectors using maximum magnitude selection. The evaluation demonstrates that MAGMAX outperforms existing methods in various scenarios, including class- and domain-incremental learning settings. The method achieves state-of-the-art results on multiple continual learning benchmarks. The paper also highlights the broader implications of the findings, showing that maximum magnitude selection in model merging can improve existing continual learning methods and that sequential fine-tuning enhances the performance of models combined using various merging techniques. The results indicate that sequential fine-tuning reduces sign conflicts between task-specific models, which is a major source of interference when merging models. The paper also investigates the effectiveness of parameter selection strategies and the contribution of task vectors. The findings suggest that model merging is a promising technique for consolidating knowledge after training rather than during training. The method is evaluated on various datasets, including CIFAR100, ImageNet-R, and DomainNet, and shows significant improvements in performance compared to existing methods. The results demonstrate that MAGMAX is effective in both class-incremental and domain-incremental learning scenarios. The paper also discusses the impact of different merging strategies and the importance of sequential fine-tuning in reducing forgetting. The findings highlight the potential of model merging as a viable solution to the challenges of continual learning.
Reach us at info@study.space