14 Dec 2014 | Ilya Sutskever, Oriol Vinyals, Quoc V. Le
This paper presents a general end-to-end approach to sequence learning using deep neural networks, specifically Long Short-Term Memory (LSTM) networks. The authors address the challenge of mapping sequences to sequences, which is a significant limitation of traditional DNNs that require fixed-dimensional input and output vectors. The proposed method uses one LSTM to encode the input sequence into a fixed-dimensional vector and another LSTM to decode this vector into the target sequence. The main contribution is demonstrated on the WMT '14 English to French translation task, where the LSTM achieved a BLEU score of 34.8, outperforming a phrase-based SMT system. The LSTM also performed well on long sentences by reversing the order of words in the source sentence, introducing short-term dependencies that simplified the optimization problem. The model's ability to capture word order and be invariant to active and passive voice is highlighted, along with its potential for solving other sequence-to-sequence problems with sufficient training data.This paper presents a general end-to-end approach to sequence learning using deep neural networks, specifically Long Short-Term Memory (LSTM) networks. The authors address the challenge of mapping sequences to sequences, which is a significant limitation of traditional DNNs that require fixed-dimensional input and output vectors. The proposed method uses one LSTM to encode the input sequence into a fixed-dimensional vector and another LSTM to decode this vector into the target sequence. The main contribution is demonstrated on the WMT '14 English to French translation task, where the LSTM achieved a BLEU score of 34.8, outperforming a phrase-based SMT system. The LSTM also performed well on long sentences by reversing the order of words in the source sentence, introducing short-term dependencies that simplified the optimization problem. The model's ability to capture word order and be invariant to active and passive voice is highlighted, along with its potential for solving other sequence-to-sequence problems with sufficient training data.