14 Dec 2014 | Ilya Sutskever, Oriol Vinyals, Quoc V. Le
This paper presents a sequence-to-sequence learning approach using deep long short-term memory (LSTM) networks. The method involves encoding an input sequence into a fixed-dimensional vector using one LSTM, and then decoding this vector into the target sequence using another LSTM. The approach is tested on the WMT'14 English to French translation task, achieving a BLEU score of 34.8 on the entire test set, which is higher than the BLEU score of a phrase-based SMT system (33.3). When used to rerank the 1000 hypotheses from the SMT system, the LSTM achieves a BLEU score of 36.5, close to the previous best result on this task. The LSTM also learns sensible phrase and sentence representations that are sensitive to word order and relatively invariant to the active and passive voice. Additionally, reversing the order of the words in the source sentences (but not the target sentences) significantly improves the LSTM's performance, as it introduces many short-term dependencies between the source and target sentences, making the optimization problem easier. The LSTM's ability to handle long sentences is also demonstrated, which is surprising given previous experiences with similar architectures. The model's success suggests that a relatively unoptimized neural network approach can outperform a phrase-based SMT system on large-scale machine translation tasks. The key technical contribution of this work is the simple trick of reversing the order of the words in the source sentences, which greatly improves the performance of the LSTM. The model is trained using a large deep LSTM with 4 layers, 1000 cells per layer, and 1000-dimensional word embeddings. The training process involves maximizing the log probability of correct translations, and the model is evaluated using the cased BLEU score. The results show that the LSTM-based approach outperforms a phrase-based SMT system on the WMT'14 task, demonstrating the effectiveness of the sequence-to-sequence learning approach using deep LSTMs.This paper presents a sequence-to-sequence learning approach using deep long short-term memory (LSTM) networks. The method involves encoding an input sequence into a fixed-dimensional vector using one LSTM, and then decoding this vector into the target sequence using another LSTM. The approach is tested on the WMT'14 English to French translation task, achieving a BLEU score of 34.8 on the entire test set, which is higher than the BLEU score of a phrase-based SMT system (33.3). When used to rerank the 1000 hypotheses from the SMT system, the LSTM achieves a BLEU score of 36.5, close to the previous best result on this task. The LSTM also learns sensible phrase and sentence representations that are sensitive to word order and relatively invariant to the active and passive voice. Additionally, reversing the order of the words in the source sentences (but not the target sentences) significantly improves the LSTM's performance, as it introduces many short-term dependencies between the source and target sentences, making the optimization problem easier. The LSTM's ability to handle long sentences is also demonstrated, which is surprising given previous experiences with similar architectures. The model's success suggests that a relatively unoptimized neural network approach can outperform a phrase-based SMT system on large-scale machine translation tasks. The key technical contribution of this work is the simple trick of reversing the order of the words in the source sentences, which greatly improves the performance of the LSTM. The model is trained using a large deep LSTM with 4 layers, 1000 cells per layer, and 1000-dimensional word embeddings. The training process involves maximizing the log probability of correct translations, and the model is evaluated using the cased BLEU score. The results show that the LSTM-based approach outperforms a phrase-based SMT system on the WMT'14 task, demonstrating the effectiveness of the sequence-to-sequence learning approach using deep LSTMs.