Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

3 Sep 2014 | Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio
This paper introduces a novel neural network model called RNN Encoder–Decoder for statistical machine translation (SMT). The model consists of two recurrent neural networks (RNNs): one encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the vector back into a sequence of symbols. The encoder and decoder are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The model is evaluated on the task of translating from English to French, where it is used as part of a standard phrase-based SMT system by scoring phrase pairs in the phrase table. The empirical results show that this approach improves translation performance. The RNN Encoder–Decoder is trained to learn a continuous space representation of phrases that preserves both semantic and syntactic structure. The model uses a novel hidden unit that adaptively controls how much each hidden unit remembers or forgets while processing sequences. This hidden unit is inspired by the LSTM unit but is simpler to compute and implement. The model is evaluated on the WMT'14 English-French translation task, where it outperforms the baseline system in terms of BLEU scores. The results show that the RNN Encoder–Decoder contributes independently to the overall performance of the SMT system, and that combining it with a neural language model further improves performance. The model is also shown to capture linguistic regularities in the phrase table, which is important for improving translation quality. The RNN Encoder–Decoder is able to generate well-formed target phrases without relying on the phrase table, suggesting that it could be used to replace or supplement the phrase table in future SMT systems. The model's ability to learn continuous space representations of words and phrases suggests that it could be useful for other natural language processing tasks beyond SMT. The proposed architecture has the potential for further improvement and analysis, and could be applied to other domains such as speech transcription.This paper introduces a novel neural network model called RNN Encoder–Decoder for statistical machine translation (SMT). The model consists of two recurrent neural networks (RNNs): one encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the vector back into a sequence of symbols. The encoder and decoder are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The model is evaluated on the task of translating from English to French, where it is used as part of a standard phrase-based SMT system by scoring phrase pairs in the phrase table. The empirical results show that this approach improves translation performance. The RNN Encoder–Decoder is trained to learn a continuous space representation of phrases that preserves both semantic and syntactic structure. The model uses a novel hidden unit that adaptively controls how much each hidden unit remembers or forgets while processing sequences. This hidden unit is inspired by the LSTM unit but is simpler to compute and implement. The model is evaluated on the WMT'14 English-French translation task, where it outperforms the baseline system in terms of BLEU scores. The results show that the RNN Encoder–Decoder contributes independently to the overall performance of the SMT system, and that combining it with a neural language model further improves performance. The model is also shown to capture linguistic regularities in the phrase table, which is important for improving translation quality. The RNN Encoder–Decoder is able to generate well-formed target phrases without relying on the phrase table, suggesting that it could be used to replace or supplement the phrase table in future SMT systems. The model's ability to learn continuous space representations of words and phrases suggests that it could be useful for other natural language processing tasks beyond SMT. The proposed architecture has the potential for further improvement and analysis, and could be applied to other domains such as speech transcription.
Reach us at info@study.space