AdaMergeX is a novel cross-lingual transfer method that leverages adaptive adapter merging to enhance the performance of large language models (LLMs) across different languages. The method addresses the challenge of limited training data by decoupling "task ability" and "language ability," and instead focuses on the gap between the target and source languages on tasks. It assumes that the divergence of adapters fine-tuned on different languages follows the same distribution across tasks, enabling the transfer of task ability from a source language to a target language.
The method introduces a reference task to determine the divergence between the target and source languages, and then merges the task ability and language ability to obtain the adapters for the target task in the target language. AdaMergeX employs a structure-adaptive adapter merging approach, which aligns with the way adapters are integrated with language models. This approach is particularly effective for LoRA and (IA) $ ^{3} $ adapters, and it outperforms existing methods in terms of performance across various multilingual tasks.
The experiments demonstrate that AdaMergeX achieves significant improvements over existing methods, including model merging, prompting, and general adapter merging methods. It outperforms MAD-X and Armerge, achieving higher accuracy on tasks such as XCOPA and XQuAD. The method is also robust to different backbone models, source languages, and reference tasks, showing consistent performance across various settings.
The results indicate that AdaMergeX is effective in cross-lingual transfer, particularly for LLMs, and that the introduction of adaptive adapter merging significantly enhances the ability to transfer task proficiency across languages. The method is flexible and can be applied to different adapter structures, making it a promising approach for cross-lingual transfer in multilingual settings.AdaMergeX is a novel cross-lingual transfer method that leverages adaptive adapter merging to enhance the performance of large language models (LLMs) across different languages. The method addresses the challenge of limited training data by decoupling "task ability" and "language ability," and instead focuses on the gap between the target and source languages on tasks. It assumes that the divergence of adapters fine-tuned on different languages follows the same distribution across tasks, enabling the transfer of task ability from a source language to a target language.
The method introduces a reference task to determine the divergence between the target and source languages, and then merges the task ability and language ability to obtain the adapters for the target task in the target language. AdaMergeX employs a structure-adaptive adapter merging approach, which aligns with the way adapters are integrated with language models. This approach is particularly effective for LoRA and (IA) $ ^{3} $ adapters, and it outperforms existing methods in terms of performance across various multilingual tasks.
The experiments demonstrate that AdaMergeX achieves significant improvements over existing methods, including model merging, prompting, and general adapter merging methods. It outperforms MAD-X and Armerge, achieving higher accuracy on tasks such as XCOPA and XQuAD. The method is also robust to different backbone models, source languages, and reference tasks, showing consistent performance across various settings.
The results indicate that AdaMergeX is effective in cross-lingual transfer, particularly for LLMs, and that the introduction of adaptive adapter merging significantly enhances the ability to transfer task proficiency across languages. The method is flexible and can be applied to different adapter structures, making it a promising approach for cross-lingual transfer in multilingual settings.