18 Mar 2015 | Sébastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio
This paper addresses the challenge of handling large target vocabularies in neural machine translation (NMT), which is a recent approach to machine translation based on neural networks. NMT has shown promising results compared to traditional methods like phrase-based statistical machine translation, but it struggles with large target vocabularies due to increased training and decoding complexity. The authors propose a method based on importance sampling that allows for efficient training and decoding with a very large target vocabulary without significantly increasing computational complexity. They demonstrate that their approach can achieve or surpass the performance of models trained with a small vocabulary, even when using an ensemble of models. The proposed method is evaluated on English→French and English→German translation tasks, achieving state-of-the-art performance in BLEU scores. The paper also discusses the trade-offs between computational efficiency and translation quality, and provides insights into the practical implementation of the approach.This paper addresses the challenge of handling large target vocabularies in neural machine translation (NMT), which is a recent approach to machine translation based on neural networks. NMT has shown promising results compared to traditional methods like phrase-based statistical machine translation, but it struggles with large target vocabularies due to increased training and decoding complexity. The authors propose a method based on importance sampling that allows for efficient training and decoding with a very large target vocabulary without significantly increasing computational complexity. They demonstrate that their approach can achieve or surpass the performance of models trained with a small vocabulary, even when using an ensemble of models. The proposed method is evaluated on English→French and English→German translation tasks, achieving state-of-the-art performance in BLEU scores. The paper also discusses the trade-offs between computational efficiency and translation quality, and provides insights into the practical implementation of the approach.