[slides] Minimum Error Rate Training in Statistical Machine Translation

This paper presents a new approach to training statistical machine translation (SMT) models by directly optimizing translation quality using automatic evaluation metrics. Traditional SMT training often uses maximum likelihood or related criteria, which have a loose relationship to final translation quality. Instead, the authors propose training criteria that directly optimize translation quality, using metrics like BLEU and word error rate (WER). The paper introduces a new algorithm for efficiently training an unsmoothed error count. It shows that significantly better results can often be achieved by incorporating the final evaluation criterion into the training process. The authors analyze various automatic evaluation criteria used in SMT, including multi-reference word error rate (mWER), multi-reference position independent error rate (mPER), BLEU, and NIST. These metrics aim to approximate human assessment and often show strong correlation with human evaluation. The authors propose two training criteria for minimum error rate training (MERT): one that directly optimizes an error count and another that uses a smoothed error count. They describe a new algorithm for efficiently optimizing the unsmoothed error count, which is more stable and has fewer local optima than the standard grid-based method. The algorithm uses a log-linear model and computes the most probable translation among candidate translations. The paper also discusses the use of baseline translation approaches, including feature functions derived from probabilistic models and dynamic programming beam search for generating candidate translations. The authors show that using the minimum error rate training approach can lead to better translation quality on unseen test data compared to traditional maximum mutual information (MMI) training. The results show that optimizing with respect to word error rate (WER) can yield better results than optimizing with respect to BLEU or NIST. The authors also find that the smoothed error count gives almost identical results to the unsmoothed error count, suggesting that the number of parameters trained is small and no serious overfitting occurs. The paper concludes that the proposed training criteria are directly related to translation quality and can lead to improved machine translation performance. The authors also note that the approach places high demands on the fidelity of the evaluation measure being optimized and highlights the importance of developing better automatic evaluation criteria for machine translation.This paper presents a new approach to training statistical machine translation (SMT) models by directly optimizing translation quality using automatic evaluation metrics. Traditional SMT training often uses maximum likelihood or related criteria, which have a loose relationship to final translation quality. Instead, the authors propose training criteria that directly optimize translation quality, using metrics like BLEU and word error rate (WER). The paper introduces a new algorithm for efficiently training an unsmoothed error count. It shows that significantly better results can often be achieved by incorporating the final evaluation criterion into the training process. The authors analyze various automatic evaluation criteria used in SMT, including multi-reference word error rate (mWER), multi-reference position independent error rate (mPER), BLEU, and NIST. These metrics aim to approximate human assessment and often show strong correlation with human evaluation. The authors propose two training criteria for minimum error rate training (MERT): one that directly optimizes an error count and another that uses a smoothed error count. They describe a new algorithm for efficiently optimizing the unsmoothed error count, which is more stable and has fewer local optima than the standard grid-based method. The algorithm uses a log-linear model and computes the most probable translation among candidate translations. The paper also discusses the use of baseline translation approaches, including feature functions derived from probabilistic models and dynamic programming beam search for generating candidate translations. The authors show that using the minimum error rate training approach can lead to better translation quality on unseen test data compared to traditional maximum mutual information (MMI) training. The results show that optimizing with respect to word error rate (WER) can yield better results than optimizing with respect to BLEU or NIST. The authors also find that the smoothed error count gives almost identical results to the unsmoothed error count, suggesting that the number of parameters trained is small and no serious overfitting occurs. The paper concludes that the proposed training criteria are directly related to translation quality and can lead to improved machine translation performance. The authors also note that the approach places high demands on the fidelity of the evaluation measure being optimized and highlights the importance of developing better automatic evaluation criteria for machine translation.

Minimum Error Rate Training in Statistical Machine Translation

| Franz Josef Och