10 Jun 2016 | Rico Sennrich and Barry Haddow and Alexandra Birch
This paper presents a method for neural machine translation (NMT) that enables open-vocabulary translation by representing rare and unknown words as sequences of subword units. Traditional NMT models use fixed vocabularies, but this approach allows the model to handle out-of-vocabulary words by breaking them into smaller subword units, such as characters or morphemes. This method is more effective than using back-off dictionaries or large vocabularies, as it improves translation quality for rare words and enables the generation of new words not seen during training. The paper evaluates different subword segmentation techniques, including byte pair encoding (BPE), which is adapted for word segmentation. BPE allows for a compact representation of open vocabularies through variable-length character sequences. The study shows that BPE-based models outperform back-off dictionaries in translation tasks such as English→German and English→Russian, achieving improvements in BLEU and CHRF3 scores. The analysis also demonstrates that subword models can learn to translate and generate new words, particularly for morphologically complex words and names. The paper concludes that subword representations are more effective and simpler than traditional methods for open-vocabulary translation.This paper presents a method for neural machine translation (NMT) that enables open-vocabulary translation by representing rare and unknown words as sequences of subword units. Traditional NMT models use fixed vocabularies, but this approach allows the model to handle out-of-vocabulary words by breaking them into smaller subword units, such as characters or morphemes. This method is more effective than using back-off dictionaries or large vocabularies, as it improves translation quality for rare words and enables the generation of new words not seen during training. The paper evaluates different subword segmentation techniques, including byte pair encoding (BPE), which is adapted for word segmentation. BPE allows for a compact representation of open vocabularies through variable-length character sequences. The study shows that BPE-based models outperform back-off dictionaries in translation tasks such as English→German and English→Russian, achieving improvements in BLEU and CHRF3 scores. The analysis also demonstrates that subword models can learn to translate and generate new words, particularly for morphologically complex words and names. The paper concludes that subword representations are more effective and simpler than traditional methods for open-vocabulary translation.