Multilingual Denoising Pre-training for Neural Machine Translation

Multilingual Denoising Pre-training for Neural Machine Translation

23 Jan 2020 | Yinhan Liu*, Jiatao Gu*, Naman Goyal*, Xian Li, Sergey Edunov Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer
This paper presents mBART, a multilingual sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages. mBART is the first method to pre-train a complete sequence-to-sequence model by denoising full texts in multiple languages, unlike previous approaches that focused on encoder, decoder, or partial text reconstruction. Pre-training a complete model allows it to be directly fine-tuned for both supervised and unsupervised machine translation without task-specific modifications. mBART achieves significant performance gains across various machine translation tasks, including up to 12 BLEU points for low-resource settings and over 5 BLEU points for document-level and unsupervised models. It also enables new types of transfer to language pairs with no bi-text or not in the pre-training corpus. The paper shows that mBART consistently performs best across different tasks and provides extensive analysis of factors contributing to effective pre-training, including the number of languages and their similarity. The model is trained on a large-scale common crawl (CC) corpus, with 25 languages, and includes a variety of pre-trained models for different levels of multilinguality. The results demonstrate that mBART improves performance in both sentence-level and document-level machine translation, and is effective for unsupervised translation through back-translation and language transfer. The paper also discusses related work in text generation, multilinguality in NLP tasks, and document translation, highlighting the importance of multilingual pre-training for improving translation performance. The study concludes that multilingual denoising pre-training significantly enhances both supervised and unsupervised machine translation at both sentence and document levels, and that pre-training is essential for effective transfer learning. Future work includes scaling up the pre-training to more languages and exploring more efficient models for production deployment.This paper presents mBART, a multilingual sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages. mBART is the first method to pre-train a complete sequence-to-sequence model by denoising full texts in multiple languages, unlike previous approaches that focused on encoder, decoder, or partial text reconstruction. Pre-training a complete model allows it to be directly fine-tuned for both supervised and unsupervised machine translation without task-specific modifications. mBART achieves significant performance gains across various machine translation tasks, including up to 12 BLEU points for low-resource settings and over 5 BLEU points for document-level and unsupervised models. It also enables new types of transfer to language pairs with no bi-text or not in the pre-training corpus. The paper shows that mBART consistently performs best across different tasks and provides extensive analysis of factors contributing to effective pre-training, including the number of languages and their similarity. The model is trained on a large-scale common crawl (CC) corpus, with 25 languages, and includes a variety of pre-trained models for different levels of multilinguality. The results demonstrate that mBART improves performance in both sentence-level and document-level machine translation, and is effective for unsupervised translation through back-translation and language transfer. The paper also discusses related work in text generation, multilinguality in NLP tasks, and document translation, highlighting the importance of multilingual pre-training for improving translation performance. The study concludes that multilingual denoising pre-training significantly enhances both supervised and unsupervised machine translation at both sentence and document levels, and that pre-training is essential for effective transfer learning. Future work includes scaling up the pre-training to more languages and exploring more efficient models for production deployment.
Reach us at info@study.space