22 Jan 2019 | Guillaume Lample* and Alexis Conneau*
This paper presents a cross-lingual language model pretraining approach that extends generative pretraining for English natural language understanding to multiple languages. The authors propose two methods for learning cross-lingual language models (XLMs): an unsupervised method that uses only monolingual data, and a supervised method that leverages parallel data with a new cross-lingual language model objective. The XLMs are evaluated on cross-lingual classification, unsupervised and supervised machine translation tasks. On the XNLI benchmark, the approach achieves a 4.9% absolute gain in accuracy. On unsupervised machine translation, the model achieves 34.3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU. On supervised machine translation, the model achieves a new state of the art of 38.5 BLEU on WMT'16 Romanian-English, outperforming the previous best approach by more than 4 BLEU. The authors also show that cross-lingual language models can significantly improve the perplexity of low-resource languages. The code and pretrained models will be made publicly available. The paper also introduces a new translation language modeling (TLM) objective that improves cross-lingual pretraining by leveraging parallel data. The TLM objective extends the masked language modeling (MLM) approach by using parallel sentences instead of consecutive sentences. The authors demonstrate that the TLM objective significantly improves performance on cross-lingual tasks. The paper also shows that cross-lingual language models can be used to create unsupervised cross-lingual word embeddings. The authors compare three approaches: MUSE, Concat, and XLM (MLM), and find that XLM (MLM) outperforms both on cross-lingual word similarity. The paper concludes that cross-lingual language model pretraining has a strong impact on natural language understanding tasks and that the TLM objective is a key contribution to the field.This paper presents a cross-lingual language model pretraining approach that extends generative pretraining for English natural language understanding to multiple languages. The authors propose two methods for learning cross-lingual language models (XLMs): an unsupervised method that uses only monolingual data, and a supervised method that leverages parallel data with a new cross-lingual language model objective. The XLMs are evaluated on cross-lingual classification, unsupervised and supervised machine translation tasks. On the XNLI benchmark, the approach achieves a 4.9% absolute gain in accuracy. On unsupervised machine translation, the model achieves 34.3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU. On supervised machine translation, the model achieves a new state of the art of 38.5 BLEU on WMT'16 Romanian-English, outperforming the previous best approach by more than 4 BLEU. The authors also show that cross-lingual language models can significantly improve the perplexity of low-resource languages. The code and pretrained models will be made publicly available. The paper also introduces a new translation language modeling (TLM) objective that improves cross-lingual pretraining by leveraging parallel data. The TLM objective extends the masked language modeling (MLM) approach by using parallel sentences instead of consecutive sentences. The authors demonstrate that the TLM objective significantly improves performance on cross-lingual tasks. The paper also shows that cross-lingual language models can be used to create unsupervised cross-lingual word embeddings. The authors compare three approaches: MUSE, Concat, and XLM (MLM), and find that XLM (MLM) outperforms both on cross-lingual word similarity. The paper concludes that cross-lingual language model pretraining has a strong impact on natural language understanding tasks and that the TLM objective is a key contribution to the field.