9-14 June 2013 | Chris Dyer, Victor Chahuneau, Noah A. Smith
This paper presents a simple, fast, and effective reparameterization of IBM Model 2 that addresses issues with Model 1's assumptions and Model 2's overparameterization. The proposed model uses a log-linear reparameterization to avoid these problems, enabling efficient inference, likelihood evaluation, and parameter estimation. Training the model is consistently ten times faster than Model 4, and systems built using the alignment model outperform IBM Model 4 on three large-scale translation tasks.
The model is a variation of the lexical translation models proposed by Brown et al. (1993). It generates the length of the target sentence, creates an alignment between source and target words, and then generates the target words based on the source words. The alignment distribution is parameterized by a null alignment probability $ p_0 $ and a precision $ \lambda $, which controls how strongly the model favors alignment points close to the diagonal.
The paper discusses efficient inference techniques, including computing marginal likelihood and alignment probabilities, and estimating model parameters using EM. The partition function evaluation is made efficient using algebraic identities, allowing for constant-time computation. The gradient of the log-partition function is also computed efficiently using an arithmetic-geometric series formula.
Experiments show that the proposed model is significantly faster to train than Model 4 and produces better translation quality. The model's alignments lead to consistently better scores than Model 4's on held-out test sets. The model is also shown to perform well in downstream translation systems on various language pairs. The paper concludes that the reparameterized IBM Model 2 is a compelling replacement for the standard Model 4.This paper presents a simple, fast, and effective reparameterization of IBM Model 2 that addresses issues with Model 1's assumptions and Model 2's overparameterization. The proposed model uses a log-linear reparameterization to avoid these problems, enabling efficient inference, likelihood evaluation, and parameter estimation. Training the model is consistently ten times faster than Model 4, and systems built using the alignment model outperform IBM Model 4 on three large-scale translation tasks.
The model is a variation of the lexical translation models proposed by Brown et al. (1993). It generates the length of the target sentence, creates an alignment between source and target words, and then generates the target words based on the source words. The alignment distribution is parameterized by a null alignment probability $ p_0 $ and a precision $ \lambda $, which controls how strongly the model favors alignment points close to the diagonal.
The paper discusses efficient inference techniques, including computing marginal likelihood and alignment probabilities, and estimating model parameters using EM. The partition function evaluation is made efficient using algebraic identities, allowing for constant-time computation. The gradient of the log-partition function is also computed efficiently using an arithmetic-geometric series formula.
Experiments show that the proposed model is significantly faster to train than Model 4 and produces better translation quality. The model's alignments lead to consistently better scores than Model 4's on held-out test sets. The model is also shown to perform well in downstream translation systems on various language pairs. The paper concludes that the reparameterized IBM Model 2 is a compelling replacement for the standard Model 4.