Hierarchical Phrase-Based Translation

Hierarchical Phrase-Based Translation

2007 | David Chiang
This paper presents a hierarchical phrase-based statistical machine translation (SMT) model that combines ideas from both syntax-based and phrase-based translation. The model is based on a synchronous context-free grammar (CFG) and is learned from parallel text without syntactic annotations. It uses hierarchical phrases, which can contain subphrases, to capture complex syntactic relationships and improve translation accuracy. The model is evaluated against the Alignment Template System (ATS), a state-of-the-art phrase-based system, and is found to perform significantly better in terms of translation accuracy, as measured by BLEU. The model uses a synchronous CFG to represent translation rules, where each rule has aligned pairs of right-hand sides. The grammar is automatically extracted from parallel text using word alignments, and rules are formed by identifying initial phrase pairs that are consistent with the alignments. The extracted rules are then used to build a grammar that can be used for translation. The grammar is further refined by applying constraints to reduce ambiguity and improve performance. The model also includes glue rules that allow the grammar to divide a French sentence into chunks and translate them one at a time. Entity rules are used to translate specific elements like numbers, dates, and names. The model is trained using a log-linear model with features such as lexical weights and penalties for extracted rules. The parameters of the model are learned using minimum-error-rate training. The decoder uses a CKY parser with beam search and a postprocessor to map French derivations to English derivations. The decoder is optimized to efficiently calculate English language-model probabilities for possible translations. The model is evaluated on Mandarin-to-English translation tasks and is found to perform better than phrase-based systems in large-scale evaluations. The system is implemented in Python with optimizations using Psyco and Pyrex.This paper presents a hierarchical phrase-based statistical machine translation (SMT) model that combines ideas from both syntax-based and phrase-based translation. The model is based on a synchronous context-free grammar (CFG) and is learned from parallel text without syntactic annotations. It uses hierarchical phrases, which can contain subphrases, to capture complex syntactic relationships and improve translation accuracy. The model is evaluated against the Alignment Template System (ATS), a state-of-the-art phrase-based system, and is found to perform significantly better in terms of translation accuracy, as measured by BLEU. The model uses a synchronous CFG to represent translation rules, where each rule has aligned pairs of right-hand sides. The grammar is automatically extracted from parallel text using word alignments, and rules are formed by identifying initial phrase pairs that are consistent with the alignments. The extracted rules are then used to build a grammar that can be used for translation. The grammar is further refined by applying constraints to reduce ambiguity and improve performance. The model also includes glue rules that allow the grammar to divide a French sentence into chunks and translate them one at a time. Entity rules are used to translate specific elements like numbers, dates, and names. The model is trained using a log-linear model with features such as lexical weights and penalties for extracted rules. The parameters of the model are learned using minimum-error-rate training. The decoder uses a CKY parser with beam search and a postprocessor to map French derivations to English derivations. The decoder is optimized to efficiently calculate English language-model probabilities for possible translations. The model is evaluated on Mandarin-to-English translation tasks and is found to perform better than phrase-based systems in large-scale evaluations. The system is implemented in Python with optimizations using Psyco and Pyrex.
Reach us at info@study.space
Understanding Hierarchical Phrase-Based Translation