[slides] HMM-Based Word Alignment in Statistical Translation

This paper introduces a new model for word alignment in statistical translation, focusing on making alignment probabilities dependent on the differences in alignment positions rather than absolute positions. The approach uses a first-order Hidden Markov Model (HMM) to address the word alignment problem, similar to its use in speech recognition for time alignment but without the monotony constraint on word orderings. The model is tested on several bilingual corpora, including the Avalanche Bulletins, Verb mobil Corpus, and EuTrans Corpus. Experimental results show that the HMM-based model produces translation probabilities comparable to the mixture alignment model and generates smoother position alignments, particularly beneficial for languages like German with compound words. However, the HMM model struggles with large jumps due to different word orderings. The paper concludes by discussing potential extensions to the HMM model, such as allowing multi-word phrases and incorporating part-of-speech tags.This paper introduces a new model for word alignment in statistical translation, focusing on making alignment probabilities dependent on the differences in alignment positions rather than absolute positions. The approach uses a first-order Hidden Markov Model (HMM) to address the word alignment problem, similar to its use in speech recognition for time alignment but without the monotony constraint on word orderings. The model is tested on several bilingual corpora, including the Avalanche Bulletins, Verb mobil Corpus, and EuTrans Corpus. Experimental results show that the HMM-based model produces translation probabilities comparable to the mixture alignment model and generates smoother position alignments, particularly beneficial for languages like German with compound words. However, the HMM model struggles with large jumps due to different word orderings. The paper concludes by discussing potential extensions to the HMM model, such as allowing multi-word phrases and incorporating part-of-speech tags.

HMM-Based Word Alignment in Statistical Translation

| Stephan Vogel Hermann Ney Christoph Tillmann