[slides and audio] Model Merging and Safety Alignment%3A One Bad Model Spoils the Bunch

This paper addresses the issue of safety alignment in the context of merging Large Language Models (LLMs). Current merging techniques often overlook the importance of safety alignment, leading to misaligned merged models. The authors propose a two-step approach to address this problem: generating synthetic safety and domain-specific data, and incorporating these data into the optimization process of existing data-aware model merging techniques. They demonstrate that existing methods not only transfer domain expertise but also propagate misalignment. By treating alignment as a task, the proposed approach ensures that the merged model retains both domain expertise and safety alignment. Experiments show that integrating alignment-related data during merging results in models that excel in both domain expertise and alignment. The paper also discusses limitations and future directions, emphasizing the need for further research to improve safety alignment in model merging.This paper addresses the issue of safety alignment in the context of merging Large Language Models (LLMs). Current merging techniques often overlook the importance of safety alignment, leading to misaligned merged models. The authors propose a two-step approach to address this problem: generating synthetic safety and domain-specific data, and incorporating these data into the optimization process of existing data-aware model merging techniques. They demonstrate that existing methods not only transfer domain expertise but also propagate misalignment. By treating alignment as a task, the proposed approach ensures that the merged model retains both domain expertise and safety alignment. Experiments show that integrating alignment-related data during merging results in models that excel in both domain expertise and alignment. The paper also discusses limitations and future directions, emphasizing the need for further research to improve safety alignment in model merging.

Model Merging and Safety Alignment: One Bad Model Spoils the Bunch

20 Jun 2024 | Hasan Abed Al Kader Hammoud, Umberto Michieli, Fabio Pizzati, Philip Torr, Adel Bibi, Bernard Ghanem, Mete Ozay