[slides] Moses%3A Open Source Toolkit for Statistical Machine Translation

Moses is an open-source toolkit for statistical machine translation (SMT) developed by a team of researchers from various institutions. The toolkit includes a decoder, training tools, and evaluation tools for SMT systems. It supports linguistically motivated factors, confusion network decoding, and efficient data formats for translation and language models. The toolkit is designed to be a complete out-of-the-box system for academic research, including preprocessing, training, and evaluation of SMT systems. It uses standard external tools for some tasks, such as GIZA++ for word alignments and SRILM for language modeling. The toolkit is designed to work with parallel environments to increase throughput. The decoder is the core component of Moses. It was developed as a drop-in replacement for Pharaoh, the popular phrase-based decoder. The decoder was developed with principles of accessibility, ease of maintenance, flexibility, ease of distributed team development, and portability. It was implemented in C++ for efficiency and followed a modular, object-oriented design. Moses supports factored translation models, where surface forms may be augmented with different factors, such as POS tags or lemmas. This allows for more accurate and flexible modeling of translation. The toolkit also supports confusion network decoding, which allows the translation of ambiguous input. This enables tighter integration of speech recognition and machine translation. Efficient data structures in Moses allow the exploitation of much larger data resources with limited hardware. The toolkit has been hosted and developed under sourceforge.net since its inception. Moses has an active research community and has reached over 1000 downloads as of March 1, 2007. The main online presence is at http://www.statmt.org/moses/. Moses was the subject of this year’s Johns Hopkins University Workshop on Machine Translation.Moses is an open-source toolkit for statistical machine translation (SMT) developed by a team of researchers from various institutions. The toolkit includes a decoder, training tools, and evaluation tools for SMT systems. It supports linguistically motivated factors, confusion network decoding, and efficient data formats for translation and language models. The toolkit is designed to be a complete out-of-the-box system for academic research, including preprocessing, training, and evaluation of SMT systems. It uses standard external tools for some tasks, such as GIZA++ for word alignments and SRILM for language modeling. The toolkit is designed to work with parallel environments to increase throughput. The decoder is the core component of Moses. It was developed as a drop-in replacement for Pharaoh, the popular phrase-based decoder. The decoder was developed with principles of accessibility, ease of maintenance, flexibility, ease of distributed team development, and portability. It was implemented in C++ for efficiency and followed a modular, object-oriented design. Moses supports factored translation models, where surface forms may be augmented with different factors, such as POS tags or lemmas. This allows for more accurate and flexible modeling of translation. The toolkit also supports confusion network decoding, which allows the translation of ambiguous input. This enables tighter integration of speech recognition and machine translation. Efficient data structures in Moses allow the exploitation of much larger data resources with limited hardware. The toolkit has been hosted and developed under sourceforge.net since its inception. Moses has an active research community and has reached over 1000 downloads as of March 1, 2007. The main online presence is at http://www.statmt.org/moses/. Moses was the subject of this year’s Johns Hopkins University Workshop on Machine Translation.

Moses: Open Source Toolkit for Statistical Machine Translation

June 2007 | Philipp Koehn, Marcello Federico, Brooke Cowan, Hieu Hoang, Nicola Bertoldi, Wade Shen, Alexandra Birch, Christine Moran, Chris Callison-Burch, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, Evan Herbst