9 Jun 2015 | Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton
This paper presents a domain-agnostic attention-enhanced sequence-to-sequence model that achieves state-of-the-art results on syntactic constituency parsing. The model is trained on a large synthetic corpus annotated using existing parsers and also performs well when trained on a small human-annotated dataset, demonstrating high data efficiency. It outperforms standard parsers and is fast, processing over 100 sentences per second with an unoptimized CPU implementation.
The model is based on a sequence-to-sequence LSTM with an attention mechanism, which allows it to focus on relevant parts of the input when generating output. The attention mechanism enables the model to handle long sequences effectively and improves performance compared to models without attention. The model was tested on the WSJ dataset and achieved an F1 score of 92.5 on section 23, which is a new state-of-the-art result.
The model was also evaluated on other datasets, including the Question Treebank and the English Web Treebank, showing good generalization beyond the news language it was trained on. The model's performance on these datasets was higher than previous results, indicating its effectiveness in parsing different types of text.
The paper also discusses the impact of various factors on the model's performance, including the use of pre-trained word vectors, input reversing, and the number of LSTM layers. The results show that the model is highly data-efficient and can achieve good performance with relatively little data.
The attention mechanism was found to be crucial for learning from small datasets, allowing the model to generalize better than a plain LSTM without attention. The model's ability to focus on relevant parts of the input helps in parsing complex sentences and improving accuracy.
The paper concludes that domain-independent models with effective learning algorithms can match and even outperform domain-specific models in syntactic constituency parsing. The model's high data efficiency and speed make it a promising approach for future research in natural language processing.This paper presents a domain-agnostic attention-enhanced sequence-to-sequence model that achieves state-of-the-art results on syntactic constituency parsing. The model is trained on a large synthetic corpus annotated using existing parsers and also performs well when trained on a small human-annotated dataset, demonstrating high data efficiency. It outperforms standard parsers and is fast, processing over 100 sentences per second with an unoptimized CPU implementation.
The model is based on a sequence-to-sequence LSTM with an attention mechanism, which allows it to focus on relevant parts of the input when generating output. The attention mechanism enables the model to handle long sequences effectively and improves performance compared to models without attention. The model was tested on the WSJ dataset and achieved an F1 score of 92.5 on section 23, which is a new state-of-the-art result.
The model was also evaluated on other datasets, including the Question Treebank and the English Web Treebank, showing good generalization beyond the news language it was trained on. The model's performance on these datasets was higher than previous results, indicating its effectiveness in parsing different types of text.
The paper also discusses the impact of various factors on the model's performance, including the use of pre-trained word vectors, input reversing, and the number of LSTM layers. The results show that the model is highly data-efficient and can achieve good performance with relatively little data.
The attention mechanism was found to be crucial for learning from small datasets, allowing the model to generalize better than a plain LSTM without attention. The model's ability to focus on relevant parts of the input helps in parsing complex sentences and improving accuracy.
The paper concludes that domain-independent models with effective learning algorithms can match and even outperform domain-specific models in syntactic constituency parsing. The model's high data efficiency and speed make it a promising approach for future research in natural language processing.