Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models

Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models

19 Jan 2024 | Terra Blevins, Tomasz Limisiewicz, Suchin Gururangan, Margaret Li, Hila Gonen, Noah A. Smith, Luke Zettlemoyer
The paper "Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models" addresses the performance degradation of multilingual language models (MLMs) due to inter-language competition for model parameters. The authors propose Cross-lingual Expert Language Models (X-ELMs), which mitigate this competition by independently training language models on subsets of a multilingual corpus. This approach specializes X-ELMs to different languages while maintaining effective ensemble performance. Experiments show that X-ELMs outperform jointly trained multilingual models across various languages and that these gains extend to downstream tasks. Key contributions include the introduction of x-BTM, an extension of the Branch-Train-Merge (BTM) paradigm, which introduces balanced clustering based on typological similarity and Hierarchical Multi-Round (HMR) training for efficient adaptation to new languages. The paper also highlights the computational efficiency and adaptability of X-ELMs, making them a promising approach for multilingual modeling.The paper "Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models" addresses the performance degradation of multilingual language models (MLMs) due to inter-language competition for model parameters. The authors propose Cross-lingual Expert Language Models (X-ELMs), which mitigate this competition by independently training language models on subsets of a multilingual corpus. This approach specializes X-ELMs to different languages while maintaining effective ensemble performance. Experiments show that X-ELMs outperform jointly trained multilingual models across various languages and that these gains extend to downstream tasks. Key contributions include the introduction of x-BTM, an extension of the Branch-Train-Merge (BTM) paradigm, which introduces balanced clustering based on typological similarity and Hierarchical Multi-Round (HMR) training for efficient adaptation to new languages. The paper also highlights the computational efficiency and adaptability of X-ELMs, making them a promising approach for multilingual modeling.
Reach us at info@study.space
Understanding Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models