20 Sep 2015 | Minh-Thang Luong, Hieu Pham, Christopher D. Manning
This paper introduces two effective attention-based mechanisms for neural machine translation (NMT): a global approach that attends to all source words and a local one that focuses on a subset of source words at a time. The authors evaluate these models on the WMT translation tasks between English and German in both directions. The local attention model achieves a significant gain of 5.0 BLEU points over non-attentional systems that already incorporate known techniques such as dropout. An ensemble model using different attention architectures achieves a new state-of-the-art result in the WMT'15 English to German translation task with 25.9 BLEU points, outperforming the existing best system by more than 1.0 BLEU points. The paper also explores various alignment functions for attention-based models and conducts extensive analysis on learning, handling long sentences, alignment quality, and translation outputs. The results show that attention-based NMT models are superior to non-attentional ones in many cases, particularly in translating names and handling long sentences. The authors also compare different alignment functions and find that certain functions are better suited for specific attentional models. The paper concludes that the proposed attention mechanisms are simple, effective, and improve the performance of NMT systems.This paper introduces two effective attention-based mechanisms for neural machine translation (NMT): a global approach that attends to all source words and a local one that focuses on a subset of source words at a time. The authors evaluate these models on the WMT translation tasks between English and German in both directions. The local attention model achieves a significant gain of 5.0 BLEU points over non-attentional systems that already incorporate known techniques such as dropout. An ensemble model using different attention architectures achieves a new state-of-the-art result in the WMT'15 English to German translation task with 25.9 BLEU points, outperforming the existing best system by more than 1.0 BLEU points. The paper also explores various alignment functions for attention-based models and conducts extensive analysis on learning, handling long sentences, alignment quality, and translation outputs. The results show that attention-based NMT models are superior to non-attentional ones in many cases, particularly in translating names and handling long sentences. The authors also compare different alignment functions and find that certain functions are better suited for specific attentional models. The paper concludes that the proposed attention mechanisms are simple, effective, and improve the performance of NMT systems.