Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

21 Aug 2017 | Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean
This paper presents a simple solution for multilingual Neural Machine Translation (NMT) using a single model. The approach introduces an artificial token at the beginning of the input sentence to specify the target language, without changing the model architecture. The encoder, decoder, and attention module remain unchanged and are shared across all languages. This method enables multilingual NMT using a single model without increasing the number of parameters, making it significantly simpler than previous approaches. The model achieves comparable performance for English→French and surpasses state-of-the-art results for English→German on WMT'14 benchmarks. It also performs well on French→English and German→English on WMT'14 and WMT'15 benchmarks. On production corpora, multilingual models with up to twelve language pairs improve the translation of many individual pairs. The model can also learn to perform implicit bridging between language pairs not seen during training, demonstrating the possibility of zero-shot translation. The paper also shows that the model learns a universal interlingua representation. Experiments on various datasets show that the model performs well in zero-shot translation and that adding additional data improves the quality. The model's architecture allows for code-switching on the source side and weighted target language mixing. The approach is simple, efficient, and effective, enabling the translation of multiple languages with minimal parameters and improved performance for low-resource languages. The model has been successfully applied in a production setting at Google, enabling the translation of a large number of languages.This paper presents a simple solution for multilingual Neural Machine Translation (NMT) using a single model. The approach introduces an artificial token at the beginning of the input sentence to specify the target language, without changing the model architecture. The encoder, decoder, and attention module remain unchanged and are shared across all languages. This method enables multilingual NMT using a single model without increasing the number of parameters, making it significantly simpler than previous approaches. The model achieves comparable performance for English→French and surpasses state-of-the-art results for English→German on WMT'14 benchmarks. It also performs well on French→English and German→English on WMT'14 and WMT'15 benchmarks. On production corpora, multilingual models with up to twelve language pairs improve the translation of many individual pairs. The model can also learn to perform implicit bridging between language pairs not seen during training, demonstrating the possibility of zero-shot translation. The paper also shows that the model learns a universal interlingua representation. Experiments on various datasets show that the model performs well in zero-shot translation and that adding additional data improves the quality. The model's architecture allows for code-switching on the source side and weighted target language mixing. The approach is simple, efficient, and effective, enabling the translation of multiple languages with minimal parameters and improved performance for low-resource languages. The model has been successfully applied in a production setting at Google, enabling the translation of a large number of languages.
Reach us at info@study.space