[slides] Exploiting Similarities among Languages for Machine Translation

This paper presents a method for machine translation that leverages distributed word representations and linear mappings between languages. The method uses large monolingual data to learn word representations and a small bilingual dictionary to learn a linear mapping between languages. This approach allows for the translation of missing word and phrase entries by projecting word vectors from the source language to the target language. The method is effective, achieving 90% precision@5 for English-Spanish translation. It is language-agnostic and can be applied to any language pair. The method is based on distributed representations of words, which are learned using the Skip-gram and Continuous Bag-of-Words (CBOW) models. These models are trained on large text data and capture linguistic regularities, allowing for the learning of linear relationships between languages. The Skip-gram model is used to learn word representations by predicting the context of a word, while the CBOW model predicts a word based on its context. Both models are efficient and can be trained on large datasets. The method involves two steps: first, building monolingual models of languages using large text data, and second, using a small bilingual dictionary to learn a linear projection between the languages. At test time, any word seen in the monolingual corpora can be translated by projecting its vector representation from the source language space to the target language space. The most similar word vector in the target language space is then used as the translation. The method is tested on the WMT11 datasets and shows high accuracy, particularly for frequent words. It outperforms baselines based on edit distance and word co-occurrence. The method is also effective for translating infrequent words and for detecting dictionary errors. It can be applied to distant language pairs, such as English and Vietnamese, demonstrating its versatility. The method's effectiveness is attributed to the similarity of geometric arrangements in vector spaces, which allows for accurate linear mappings between languages. The approach is complementary to existing methods and has the potential to improve current machine translation systems by providing a simple and effective way to generate and extend dictionaries and phrase tables.This paper presents a method for machine translation that leverages distributed word representations and linear mappings between languages. The method uses large monolingual data to learn word representations and a small bilingual dictionary to learn a linear mapping between languages. This approach allows for the translation of missing word and phrase entries by projecting word vectors from the source language to the target language. The method is effective, achieving 90% precision@5 for English-Spanish translation. It is language-agnostic and can be applied to any language pair. The method is based on distributed representations of words, which are learned using the Skip-gram and Continuous Bag-of-Words (CBOW) models. These models are trained on large text data and capture linguistic regularities, allowing for the learning of linear relationships between languages. The Skip-gram model is used to learn word representations by predicting the context of a word, while the CBOW model predicts a word based on its context. Both models are efficient and can be trained on large datasets. The method involves two steps: first, building monolingual models of languages using large text data, and second, using a small bilingual dictionary to learn a linear projection between the languages. At test time, any word seen in the monolingual corpora can be translated by projecting its vector representation from the source language space to the target language space. The most similar word vector in the target language space is then used as the translation. The method is tested on the WMT11 datasets and shows high accuracy, particularly for frequent words. It outperforms baselines based on edit distance and word co-occurrence. The method is also effective for translating infrequent words and for detecting dictionary errors. It can be applied to distant language pairs, such as English and Vietnamese, demonstrating its versatility. The method's effectiveness is attributed to the similarity of geometric arrangements in vector spaces, which allows for accurate linear mappings between languages. The approach is complementary to existing methods and has the potential to improve current machine translation systems by providing a simple and effective way to generate and extend dictionaries and phrase tables.

Exploiting Similarities among Languages for Machine Translation

17 Sep 2013 | Tomas Mikolov, Quoc V. Le, Ilya Sutskever