May-June 2003 | Philipp Koehn, Franz Josef Och, Daniel Marcu
The paper introduces a new phrase-based translation model and decoding algorithm, which allows for the evaluation and comparison of various phrase-based translation models. The authors conduct extensive experiments to understand why phrase-based models outperform word-based models. Their empirical results, applicable to all examined language pairs, suggest that high performance can be achieved through simple methods: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translations. Learning phrases longer than three words and learning from high-accuracy word-level alignment models do not significantly improve performance, and learning only syntactically motivated phrases degrades performance. The paper also discusses the importance of choosing the right alignment heuristic and the impact of different methods for learning phrase translations. The findings are validated using additional language pairs, confirming the effectiveness of the proposed approach.The paper introduces a new phrase-based translation model and decoding algorithm, which allows for the evaluation and comparison of various phrase-based translation models. The authors conduct extensive experiments to understand why phrase-based models outperform word-based models. Their empirical results, applicable to all examined language pairs, suggest that high performance can be achieved through simple methods: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translations. Learning phrases longer than three words and learning from high-accuracy word-level alignment models do not significantly improve performance, and learning only syntactically motivated phrases degrades performance. The paper also discusses the importance of choosing the right alignment heuristic and the impact of different methods for learning phrase translations. The findings are validated using additional language pairs, confirming the effectiveness of the proposed approach.