July 5 - 10, 2020 | Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov
This paper presents XLM-R, a multilingual masked language model trained on text in 100 languages using over 2 terabytes of filtered CommonCrawl data. XLM-R significantly outperforms multilingual BERT (mBERT) on cross-lingual benchmarks, achieving +14.6% average accuracy on XNLI, +13% average F1 score on MLQA, and +2.4% F1 score on NER. It performs particularly well on low-resource languages, improving XNLI accuracy for Swahili by 15.7% and Urdu by 11.4% over previous XLM models. The paper also presents a detailed analysis of the trade-offs between positive transfer and capacity dilution, and between high and low resource languages at scale. XLM-R is competitive with strong monolingual models on the GLUE and XNLI benchmarks. The paper shows that multilingual modeling can be done without sacrificing per-language performance, and that XLM-R outperforms previous state-of-the-art models on cross-lingual classification, sequence labeling, and question answering. The paper also evaluates monolingual fine-tuning on the GLUE and XNLI benchmarks, where XLM-R achieves results competitive with state-of-the-art monolingual models. The paper highlights the importance of scaling the model size and training data, and shows that increasing the vocabulary size and using a larger model can improve performance. The paper also discusses the impact of language sampling and vocabulary size on performance, and shows that using a larger vocabulary can improve performance. The paper also discusses the importance of simplifying multilingual tokenization with Sentence Piece, and shows that this can improve performance. The paper also discusses the effectiveness of multilingual models on low-resource languages, and shows that XLM-R outperforms previous models on these languages. The paper concludes that XLM-R is a significant advancement in multilingual language modeling, and that it provides strong performance on a wide range of cross-lingual tasks.This paper presents XLM-R, a multilingual masked language model trained on text in 100 languages using over 2 terabytes of filtered CommonCrawl data. XLM-R significantly outperforms multilingual BERT (mBERT) on cross-lingual benchmarks, achieving +14.6% average accuracy on XNLI, +13% average F1 score on MLQA, and +2.4% F1 score on NER. It performs particularly well on low-resource languages, improving XNLI accuracy for Swahili by 15.7% and Urdu by 11.4% over previous XLM models. The paper also presents a detailed analysis of the trade-offs between positive transfer and capacity dilution, and between high and low resource languages at scale. XLM-R is competitive with strong monolingual models on the GLUE and XNLI benchmarks. The paper shows that multilingual modeling can be done without sacrificing per-language performance, and that XLM-R outperforms previous state-of-the-art models on cross-lingual classification, sequence labeling, and question answering. The paper also evaluates monolingual fine-tuning on the GLUE and XNLI benchmarks, where XLM-R achieves results competitive with state-of-the-art monolingual models. The paper highlights the importance of scaling the model size and training data, and shows that increasing the vocabulary size and using a larger model can improve performance. The paper also discusses the impact of language sampling and vocabulary size on performance, and shows that using a larger vocabulary can improve performance. The paper also discusses the importance of simplifying multilingual tokenization with Sentence Piece, and shows that this can improve performance. The paper also discusses the effectiveness of multilingual models on low-resource languages, and shows that XLM-R outperforms previous models on these languages. The paper concludes that XLM-R is a significant advancement in multilingual language modeling, and that it provides strong performance on a wide range of cross-lingual tasks.