[slides and audio] Convolutional Sequence to Sequence Learning

The paper introduces a fully convolutional architecture for sequence-to-sequence learning, which is an alternative to the prevalent approach using recurrent neural networks (RNNs). The proposed architecture leverages convolutional neural networks (CNNs) to map input sequences to variable-length output sequences. Key contributions include the use of gated linear units (GLUs) for gradient propagation, residual connections, and multi-step attention mechanisms in each decoder layer. The model is evaluated on several large datasets for machine translation and summarization tasks, outperforming state-of-the-art recurrent models in terms of accuracy and speed. Specifically, the Convolutional Sequence-to-Sequence Learning (ConvS2S) model achieves new state-of-the-art results on WMT'16 English-Romanian, WMT'14 English-German, and WMT'14 English-French translation tasks, with significant improvements in BLEU scores. Additionally, the ConvS2S model demonstrates faster inference speeds on both GPU and CPU hardware compared to recurrent models. The paper also discusses the effectiveness of position embeddings, multi-step attention, and kernel size and depth in the architecture.The paper introduces a fully convolutional architecture for sequence-to-sequence learning, which is an alternative to the prevalent approach using recurrent neural networks (RNNs). The proposed architecture leverages convolutional neural networks (CNNs) to map input sequences to variable-length output sequences. Key contributions include the use of gated linear units (GLUs) for gradient propagation, residual connections, and multi-step attention mechanisms in each decoder layer. The model is evaluated on several large datasets for machine translation and summarization tasks, outperforming state-of-the-art recurrent models in terms of accuracy and speed. Specifically, the Convolutional Sequence-to-Sequence Learning (ConvS2S) model achieves new state-of-the-art results on WMT'16 English-Romanian, WMT'14 English-German, and WMT'14 English-French translation tasks, with significant improvements in BLEU scores. Additionally, the ConvS2S model demonstrates faster inference speeds on both GPU and CPU hardware compared to recurrent models. The paper also discusses the effectiveness of position embeddings, multi-step attention, and kernel size and depth in the architecture.

Convolutional Sequence to Sequence Learning

25 Jul 2017 | Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin