[slides] Discriminative Training and Maximum Entropy Models for Statistical Machine Translation

The paper presents a framework for statistical machine translation (SMT) based on direct maximum entropy models, which includes the widely used source-channel approach as a special case. The approach treats all knowledge sources as feature functions that depend on the source and target sentences and possible hidden variables. This allows for easy extension of a baseline SMT system by adding new feature functions. The authors show that using this approach significantly improves the performance of a baseline SMT system. The source-channel approach, also known as the fundamental equation of statistical MT, involves maximizing the joint probability of the target sentence and the language model of the source sentence. However, this approach has limitations, such as the difficulty in combining language and translation models and the lack of straightforward ways to extend the model with additional dependencies. In contrast, the direct maximum entropy translation model directly models the posterior probability of the target sentence given the source sentence. This model uses a set of feature functions and their corresponding parameters to compute the translation probability. The framework includes the source-channel approach as a special case by using specific feature functions. The authors demonstrate that this approach can be effectively trained using the Generalized Iterative Scaling (GIS) algorithm and that it converges without significant overfitting. The paper also discusses the use of alignment templates, which are pairs of source and target phrases with alignments between words, to refine the translation probability. Additional features such as sentence length, additional language models, conventional dictionaries, and lexical and grammatical features are introduced to improve the system's performance. The results on the VERBMOBIL task show that the proposed approach significantly reduces sentence error rates compared to a baseline system using only the four baseline features. The authors conclude that the direct maximum entropy model framework is more general and flexible than the source-channel approach, allowing for easy extension and significant improvement in SMT systems.The paper presents a framework for statistical machine translation (SMT) based on direct maximum entropy models, which includes the widely used source-channel approach as a special case. The approach treats all knowledge sources as feature functions that depend on the source and target sentences and possible hidden variables. This allows for easy extension of a baseline SMT system by adding new feature functions. The authors show that using this approach significantly improves the performance of a baseline SMT system. The source-channel approach, also known as the fundamental equation of statistical MT, involves maximizing the joint probability of the target sentence and the language model of the source sentence. However, this approach has limitations, such as the difficulty in combining language and translation models and the lack of straightforward ways to extend the model with additional dependencies. In contrast, the direct maximum entropy translation model directly models the posterior probability of the target sentence given the source sentence. This model uses a set of feature functions and their corresponding parameters to compute the translation probability. The framework includes the source-channel approach as a special case by using specific feature functions. The authors demonstrate that this approach can be effectively trained using the Generalized Iterative Scaling (GIS) algorithm and that it converges without significant overfitting. The paper also discusses the use of alignment templates, which are pairs of source and target phrases with alignments between words, to refine the translation probability. Additional features such as sentence length, additional language models, conventional dictionaries, and lexical and grammatical features are introduced to improve the system's performance. The results on the VERBMOBIL task show that the proposed approach significantly reduces sentence error rates compared to a baseline system using only the four baseline features. The authors conclude that the direct maximum entropy model framework is more general and flexible than the source-channel approach, allowing for easy extension and significant improvement in SMT systems.

Discriminative Training and Maximum Entropy Models for Statistical Machine Translation

July 2002 | Franz Josef Och and Hermann Ney