29 Jul 2024 | Daniel Marczak*,1,2, Bartłomiej Twardowski1,5,6, Tomasz Trzcinski1,2,4, Sebastian Cygert1,3
This paper introduces MAGMAX, a novel approach to continual learning that leverages model merging via maximum magnitude selection alongside sequential fine-tuning. The authors argue that large pre-trained models, which are fundamental in complex machine learning systems, need to adapt continuously to new data without forgetting previously acquired knowledge. Traditional continual learning methods focus on reducing forgetting during task training, while MAGMAX combines sequential fine-tuning with a maximum magnitude weight selection strategy to effectively integrate knowledge across tasks.
The key contributions of the paper include:
1. An extensive examination of model merging techniques, revealing that simple approaches like weight averaging and random weight selection perform surprisingly well in various continual learning contexts.
2. The proposal of MAGMAX, a novel model-merging strategy that enables the continual learning of large pre-trained models for successive tasks.
3. Thorough evaluation demonstrating the superiority of MAGMAX in various scenarios, including class- and domain-incremental learning settings.
The authors highlight that sequential fine-tuning simplifies model merging by reducing sign conflicts, which are a major source of interference when merging models. The maximum magnitude selection strategy chooses the important parameter values, enhancing the performance of the final model.
The paper also discusses the broader implications of their findings, showing that merging with maximum magnitude selection can improve existing continual learning methods and that sequential fine-tuning facilitates other merging techniques. The code for MAGMAX is available at <https://github.com/danielm1405/magmax>.This paper introduces MAGMAX, a novel approach to continual learning that leverages model merging via maximum magnitude selection alongside sequential fine-tuning. The authors argue that large pre-trained models, which are fundamental in complex machine learning systems, need to adapt continuously to new data without forgetting previously acquired knowledge. Traditional continual learning methods focus on reducing forgetting during task training, while MAGMAX combines sequential fine-tuning with a maximum magnitude weight selection strategy to effectively integrate knowledge across tasks.
The key contributions of the paper include:
1. An extensive examination of model merging techniques, revealing that simple approaches like weight averaging and random weight selection perform surprisingly well in various continual learning contexts.
2. The proposal of MAGMAX, a novel model-merging strategy that enables the continual learning of large pre-trained models for successive tasks.
3. Thorough evaluation demonstrating the superiority of MAGMAX in various scenarios, including class- and domain-incremental learning settings.
The authors highlight that sequential fine-tuning simplifies model merging by reducing sign conflicts, which are a major source of interference when merging models. The maximum magnitude selection strategy chooses the important parameter values, enhancing the performance of the final model.
The paper also discusses the broader implications of their findings, showing that merging with maximum magnitude selection can improve existing continual learning methods and that sequential fine-tuning facilitates other merging techniques. The code for MAGMAX is available at <https://github.com/danielm1405/magmax>.