AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

29 Feb 2024 | Yiran Zhao1,2*, Wenxuan Zhang1,3† Huiming Wang1,4* Kenji Kawaguchi2 Lidong Bing1,3
AdaMergeX is a novel cross-lingual transfer method that addresses the challenge of limited training data by decoupling "task ability" and "language ability." Unlike existing methods that either fine-tune on the target task in specific languages or use translation, representation alignment, or prompting, AdaMergeX acknowledges the mutual reliance between these two abilities. It proposes a structure-adaptive adapter merging approach to merge task ability and language ability, achieving better performance across various settings. The method introduces a reference task to determine the divergence between adapters fine-tuned on the reference task in both source and target languages, allowing for the acquisition of target adapters by combining other three adapters. Empirical results on a wide range of multilingual tasks, covering 12 languages, demonstrate that AdaMergeX outperforms existing methods, including model merging, prompting, and general adapter merging techniques. The method is robust to different backbone models, source languages, and reference tasks, showing consistent performance across various scenarios.AdaMergeX is a novel cross-lingual transfer method that addresses the challenge of limited training data by decoupling "task ability" and "language ability." Unlike existing methods that either fine-tune on the target task in specific languages or use translation, representation alignment, or prompting, AdaMergeX acknowledges the mutual reliance between these two abilities. It proposes a structure-adaptive adapter merging approach to merge task ability and language ability, achieving better performance across various settings. The method introduces a reference task to determine the divergence between adapters fine-tuned on the reference task in both source and target languages, allowing for the acquisition of target adapters by combining other three adapters. Empirical results on a wide range of multilingual tasks, covering 12 languages, demonstrate that AdaMergeX outperforms existing methods, including model merging, prompting, and general adapter merging techniques. The method is robust to different backbone models, source languages, and reference tasks, showing consistent performance across various scenarios.
Reach us at info@study.space
[slides] AdaMergeX%3A Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging | StudySpace