WORD TRANSLATION WITHOUT PARALLEL DATA

WORD TRANSLATION WITHOUT PARALLEL DATA

30 Jan 2018 | Alexis Conneau*,†‡§, Guillaume Lample*,†§, Marc’Aurelio Ranzato†, Ludovic Denoyer§, Hervé Jégou†
This paper presents an unsupervised method for learning cross-lingual word embeddings without the need for parallel data. The authors propose a model that aligns monolingual word embedding spaces using adversarial training, followed by a refinement step with the Procrustes algorithm. This approach outperforms existing supervised methods on several language pairs and tasks, including word translation, sentence translation retrieval, and cross-lingual word similarity. The method is particularly effective for distant language pairs and low-resource scenarios, as demonstrated on the English-Esperanto pair. The paper also introduces a cross-domain similarity local scaling (CSLS) metric to mitigate the hubness problem, improving word translation accuracy. The authors release high-quality dictionaries, word embeddings, and code for public use.This paper presents an unsupervised method for learning cross-lingual word embeddings without the need for parallel data. The authors propose a model that aligns monolingual word embedding spaces using adversarial training, followed by a refinement step with the Procrustes algorithm. This approach outperforms existing supervised methods on several language pairs and tasks, including word translation, sentence translation retrieval, and cross-lingual word similarity. The method is particularly effective for distant language pairs and low-resource scenarios, as demonstrated on the English-Esperanto pair. The paper also introduces a cross-domain similarity local scaling (CSLS) metric to mitigate the hubness problem, improving word translation accuracy. The authors release high-quality dictionaries, word embeddings, and code for public use.
Reach us at info@study.space