22 Jan 2019 | Guillaume Lample* and Alexis Conneau*
This paper explores the effectiveness of cross-lingual language model pretraining, extending the successful approach of generative pretraining for English natural language understanding to multiple languages. The authors propose two methods: an unsupervised method relying solely on monolingual data and a supervised method leveraging parallel data with a new cross-lingual language model objective. The results show significant improvements on cross-lingual classification, unsupervised and supervised machine translation tasks. Specifically, the approach achieves an absolute gain of 4.9% accuracy on the XNLI benchmark, 34.3 BLEU on WMT'16 German-English for unsupervised machine translation, and 38.5 BLEU on WMT'16 Romanian-English for supervised machine translation. The code and pretrained models are made publicly available. The paper also discusses the impact of cross-lingual language model pretraining on low-resource languages and unsupervised cross-lingual word embeddings.This paper explores the effectiveness of cross-lingual language model pretraining, extending the successful approach of generative pretraining for English natural language understanding to multiple languages. The authors propose two methods: an unsupervised method relying solely on monolingual data and a supervised method leveraging parallel data with a new cross-lingual language model objective. The results show significant improvements on cross-lingual classification, unsupervised and supervised machine translation tasks. Specifically, the approach achieves an absolute gain of 4.9% accuracy on the XNLI benchmark, 34.3 BLEU on WMT'16 German-English for unsupervised machine translation, and 38.5 BLEU on WMT'16 Romanian-English for supervised machine translation. The code and pretrained models are made publicly available. The paper also discusses the impact of cross-lingual language model pretraining on low-resource languages and unsupervised cross-lingual word embeddings.