On Using Very Large Target Vocabulary for Neural Machine Translation

On Using Very Large Target Vocabulary for Neural Machine Translation

18 Mar 2015 | Sébastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio
This paper proposes a method to train neural machine translation (NMT) models with a very large target vocabulary without increasing training complexity. The approach uses importance sampling to approximate the computation of the normalization constant, allowing the model to use a large vocabulary while keeping computational complexity constant. The method is shown to be effective in both training and decoding, and the models trained with this approach match or outperform baseline models with smaller vocabularies. The paper also demonstrates that using an ensemble of models with large vocabularies achieves performance comparable to the state of the art on the WMT'14 English→German and English→French translation tasks. The approach is based on the earlier work of Bengio and Sénécal (2008), which used importance sampling to reduce the complexity of computing the normalization constant in neural language models. The method is applied to the English→French and English→German translation tasks, and the results show that the models trained with this approach perform as well as or better than those using limited target vocabularies. The paper also discusses the decoding process, showing that using a subset of candidate words during decoding can significantly improve speed without sacrificing performance. The results indicate that the proposed approach is effective in both training and decoding, and that the models trained with this approach achieve state-of-the-art performance on the WMT'14 tasks.This paper proposes a method to train neural machine translation (NMT) models with a very large target vocabulary without increasing training complexity. The approach uses importance sampling to approximate the computation of the normalization constant, allowing the model to use a large vocabulary while keeping computational complexity constant. The method is shown to be effective in both training and decoding, and the models trained with this approach match or outperform baseline models with smaller vocabularies. The paper also demonstrates that using an ensemble of models with large vocabularies achieves performance comparable to the state of the art on the WMT'14 English→German and English→French translation tasks. The approach is based on the earlier work of Bengio and Sénécal (2008), which used importance sampling to reduce the complexity of computing the normalization constant in neural language models. The method is applied to the English→French and English→German translation tasks, and the results show that the models trained with this approach perform as well as or better than those using limited target vocabularies. The paper also discusses the decoding process, showing that using a subset of candidate words during decoding can significantly improve speed without sacrificing performance. The results indicate that the proposed approach is effective in both training and decoding, and that the models trained with this approach achieve state-of-the-art performance on the WMT'14 tasks.
Reach us at info@study.space