23 Apr 2018 | Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, Quoc V. Le
QANet is a new end-to-end machine reading comprehension model that replaces recurrent networks with convolution and self-attention mechanisms. The model uses convolution to capture local interactions and self-attention to model global interactions. It achieves significant speed improvements over existing models, with training and inference speeds up to 13x and 9x faster, respectively, while maintaining equivalent accuracy. On the SQuAD dataset, the model achieves an F1 score of 84.6 on the test set, which is significantly better than the best published result of 81.8. The speedup allows the model to be trained with more data, and combining it with data generated by backtranslation from a neural machine translation model further improves performance. The model is also efficient and can be trained with more data than other models. The model is fast and can be trained with much more data than other models. To further improve the model, a data augmentation technique is proposed, which paraphrases examples by translating sentences from English to another language and back. This technique enhances the number of training instances and diversifies the phrasing. The model is also effective on other datasets such as TriviaQA, achieving high accuracy and speed improvements. The model is a feedforward architecture composed of convolutions and self-attention, which is suitable for parallel computation. The model is both fast and accurate, surpassing the best published results on the SQuAD dataset while being up to 13/9 times faster than a competitive recurrent model for a training/inference iteration. The model is the first to combine self-attention and convolutions, which proves to be empirically effective and achieves a significant gain of 2.7 F1. The model is also the first to use data augmentation consisting of translating context and passage pairs to and from another language as a way of paraphrasing the questions and contexts.QANet is a new end-to-end machine reading comprehension model that replaces recurrent networks with convolution and self-attention mechanisms. The model uses convolution to capture local interactions and self-attention to model global interactions. It achieves significant speed improvements over existing models, with training and inference speeds up to 13x and 9x faster, respectively, while maintaining equivalent accuracy. On the SQuAD dataset, the model achieves an F1 score of 84.6 on the test set, which is significantly better than the best published result of 81.8. The speedup allows the model to be trained with more data, and combining it with data generated by backtranslation from a neural machine translation model further improves performance. The model is also efficient and can be trained with more data than other models. The model is fast and can be trained with much more data than other models. To further improve the model, a data augmentation technique is proposed, which paraphrases examples by translating sentences from English to another language and back. This technique enhances the number of training instances and diversifies the phrasing. The model is also effective on other datasets such as TriviaQA, achieving high accuracy and speed improvements. The model is a feedforward architecture composed of convolutions and self-attention, which is suitable for parallel computation. The model is both fast and accurate, surpassing the best published results on the SQuAD dataset while being up to 13/9 times faster than a competitive recurrent model for a training/inference iteration. The model is the first to combine self-attention and convolutions, which proves to be empirically effective and achieves a significant gain of 2.7 F1. The model is also the first to use data augmentation consisting of translating context and passage pairs to and from another language as a way of paraphrasing the questions and contexts.