10 Jun 2016 | Rico Sennrich and Barry Haddow and Alexandra Birch
This paper addresses the challenge of translating rare and unknown words in neural machine translation (NMT) models, which typically operate with a fixed vocabulary. The authors propose a method to make NMT models capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units. This approach is based on the intuition that various word classes can be translated using smaller units than words, such as names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). The paper discusses different word segmentation techniques, including simple character n-gram models and a segmentation based on the byte pair encoding (BPE) compression algorithm. Empirical results show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English→German and English→Russian by up to 1.1 and 1.3 BLEU, respectively. The authors also adapt BPE to the task of word segmentation, allowing for the representation of an open vocabulary through a fixed-size vocabulary of variable-length character sequences. The paper concludes with a discussion of the effectiveness of subword segmentations and the potential for further improvements in bilingually informed segmentation algorithms.This paper addresses the challenge of translating rare and unknown words in neural machine translation (NMT) models, which typically operate with a fixed vocabulary. The authors propose a method to make NMT models capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units. This approach is based on the intuition that various word classes can be translated using smaller units than words, such as names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). The paper discusses different word segmentation techniques, including simple character n-gram models and a segmentation based on the byte pair encoding (BPE) compression algorithm. Empirical results show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English→German and English→Russian by up to 1.1 and 1.3 BLEU, respectively. The authors also adapt BPE to the task of word segmentation, allowing for the representation of an open vocabulary through a fixed-size vocabulary of variable-length character sequences. The paper concludes with a discussion of the effectiveness of subword segmentations and the potential for further improvements in bilingually informed segmentation algorithms.