Recurrent Continuous Translation Models

Recurrent Continuous Translation Models

18-21 October 2013 | Nal Kalchbrenner, Phil Blunsom
This paper introduces Recurrent Continuous Translation Models (RCTMs), a class of probabilistic continuous translation models that use continuous representations for words, phrases, and sentences without relying on alignments or phrasal translation units. The models have a generation and conditioning aspect. The generation of the translation is modeled with a target Recurrent Language Model (RLM), while the conditioning on the source sentence is modeled with a Convolutional Sentence Model (CSM). The RCTMs achieve significant improvements in perplexity compared to state-of-the-art alignment-based translation models, with a perplexity that is over 43% lower than that of a state-of-the-art variant of IBM Model 2. They are also highly sensitive to the word order, syntax, and meaning of the source sentence, and they match the performance of a state-of-the-art translation system when rescoring n-best lists of translations. The RCTMs are based on a general modeling framework that estimates the probability of a target sentence being a translation of a source sentence. The RCTM I uses a CSM to create a sentence representation that is then used to condition the target language model on the source sentence. The RCTM II introduces an intermediate representation, using a truncated variant of the CSM to first transform the source word representations into representations for the target words, which then constrain the generation of the target sentence. Both models use continuous representations for the constituents and are trained as a single joint architecture. The RCTMs are evaluated in four experiments. The first experiment shows that the RCTMs achieve significantly lower perplexity compared to IBM Model 1 and a state-of-the-art variant of IBM Model 2. The second and third experiments demonstrate the sensitivity of the RCTM II to the linguistic information in the source sentence, showing that the model is highly sensitive to word position and order. The third experiment shows that the generated translations demonstrate remarkable morphological, syntactic, and semantic agreement with the source sentence. The fourth experiment tests the rescoring performance of the RCTMs, showing that they match the performance of a state-of-the-art translation system when rescoring n-best lists of translations. The results indicate that the RCTMs are able to capture significant syntactic and semantic information from the source sentence and successfully transfer it to the target language.This paper introduces Recurrent Continuous Translation Models (RCTMs), a class of probabilistic continuous translation models that use continuous representations for words, phrases, and sentences without relying on alignments or phrasal translation units. The models have a generation and conditioning aspect. The generation of the translation is modeled with a target Recurrent Language Model (RLM), while the conditioning on the source sentence is modeled with a Convolutional Sentence Model (CSM). The RCTMs achieve significant improvements in perplexity compared to state-of-the-art alignment-based translation models, with a perplexity that is over 43% lower than that of a state-of-the-art variant of IBM Model 2. They are also highly sensitive to the word order, syntax, and meaning of the source sentence, and they match the performance of a state-of-the-art translation system when rescoring n-best lists of translations. The RCTMs are based on a general modeling framework that estimates the probability of a target sentence being a translation of a source sentence. The RCTM I uses a CSM to create a sentence representation that is then used to condition the target language model on the source sentence. The RCTM II introduces an intermediate representation, using a truncated variant of the CSM to first transform the source word representations into representations for the target words, which then constrain the generation of the target sentence. Both models use continuous representations for the constituents and are trained as a single joint architecture. The RCTMs are evaluated in four experiments. The first experiment shows that the RCTMs achieve significantly lower perplexity compared to IBM Model 1 and a state-of-the-art variant of IBM Model 2. The second and third experiments demonstrate the sensitivity of the RCTM II to the linguistic information in the source sentence, showing that the model is highly sensitive to word position and order. The third experiment shows that the generated translations demonstrate remarkable morphological, syntactic, and semantic agreement with the source sentence. The fourth experiment tests the rescoring performance of the RCTMs, showing that they match the performance of a state-of-the-art translation system when rescoring n-best lists of translations. The results indicate that the RCTMs are able to capture significant syntactic and semantic information from the source sentence and successfully transfer it to the target language.
Reach us at info@study.space