21 Aug 2017 | Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean
Google researchers propose a method to use a single Neural Machine Translation (NMT) model for translating between multiple languages. The approach involves adding an artificial token at the beginning of the input sentence to specify the target language, while keeping the rest of the model architecture unchanged. This method simplifies the training and deployment process, allowing for better translation quality on low-resource languages and enabling zero-shot translation, where the model can translate between language pairs it has never seen during training. The paper demonstrates the effectiveness of this approach on various benchmarks and production datasets, showing significant improvements over single-language models. Additionally, the models exhibit interesting behaviors, such as handling code-switching and mixing target languages, and learning a universal interlingual representation. The method has been successfully implemented in a Google-scale production setting, enabling efficient translation across a wide range of languages.Google researchers propose a method to use a single Neural Machine Translation (NMT) model for translating between multiple languages. The approach involves adding an artificial token at the beginning of the input sentence to specify the target language, while keeping the rest of the model architecture unchanged. This method simplifies the training and deployment process, allowing for better translation quality on low-resource languages and enabling zero-shot translation, where the model can translate between language pairs it has never seen during training. The paper demonstrates the effectiveness of this approach on various benchmarks and production datasets, showing significant improvements over single-language models. Additionally, the models exhibit interesting behaviors, such as handling code-switching and mixing target languages, and learning a universal interlingual representation. The method has been successfully implemented in a Google-scale production setting, enabling efficient translation across a wide range of languages.