Exploiting Similarities among Languages for Machine Translation

Exploiting Similarities among Languages for Machine Translation

17 Sep 2013 | Tomas Mikolov, Quoc V. Le, Ilya Sutskever
This paper introduces a method for automating the generation and extension of dictionaries and phrase tables in statistical machine translation systems. The approach leverages distributed representations of words and phrases, learned from large monolingual data, to map between languages using a linear transformation. This method is effective for translating missing word and phrase entries, achieving nearly 90% precision@5 for English to Spanish translations. The technique is versatile, applicable to various language pairs, and provides translation scores for word pairs, aiding in improving existing dictionaries and phrase tables. The paper also discusses the use of Skip-gram and CBOW models for learning word representations and demonstrates the method's performance on WMT11 datasets and large-scale English-Spanish corpora. Additionally, it explores the detection of dictionary errors and translation between distant language pairs, such as English and Vietnamese.This paper introduces a method for automating the generation and extension of dictionaries and phrase tables in statistical machine translation systems. The approach leverages distributed representations of words and phrases, learned from large monolingual data, to map between languages using a linear transformation. This method is effective for translating missing word and phrase entries, achieving nearly 90% precision@5 for English to Spanish translations. The technique is versatile, applicable to various language pairs, and provides translation scores for word pairs, aiding in improving existing dictionaries and phrase tables. The paper also discusses the use of Skip-gram and CBOW models for learning word representations and demonstrates the method's performance on WMT11 datasets and large-scale English-Spanish corpora. Additionally, it explores the detection of dictionary errors and translation between distant language pairs, such as English and Vietnamese.
Reach us at info@study.space