7 Apr 2016 | Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer
This paper introduces two new neural architectures for named entity recognition (NER) that do not rely on language-specific resources or features beyond a small amount of supervised training data and unlabeled corpora. The first model, LSTM-CRF, combines bidirectional LSTMs with a conditional random field (CRF) layer to capture dependencies between tagging decisions for each token. The second model, S-LSTM, uses a transition-based approach inspired by shift-reduce parsers to construct and label segments of input sentences. Both models utilize character-based word representations learned from supervised corpora and unsupervised word representations from unannotated corpora. Experiments in English, Dutch, German, and Spanish show that the LSTM-CRF model achieves state-of-the-art performance in Dutch, German, and Spanish, and very near the state-of-the-art in English without any hand-engineered features or gazetteers. The S-LSTM model also surpasses previous published results in several languages. The paper discusses the architecture, training methods, and experimental results, highlighting the importance of character-level and word representations, and dropout training for generalization.This paper introduces two new neural architectures for named entity recognition (NER) that do not rely on language-specific resources or features beyond a small amount of supervised training data and unlabeled corpora. The first model, LSTM-CRF, combines bidirectional LSTMs with a conditional random field (CRF) layer to capture dependencies between tagging decisions for each token. The second model, S-LSTM, uses a transition-based approach inspired by shift-reduce parsers to construct and label segments of input sentences. Both models utilize character-based word representations learned from supervised corpora and unsupervised word representations from unannotated corpora. Experiments in English, Dutch, German, and Spanish show that the LSTM-CRF model achieves state-of-the-art performance in Dutch, German, and Spanish, and very near the state-of-the-art in English without any hand-engineered features or gazetteers. The S-LSTM model also surpasses previous published results in several languages. The paper discusses the architecture, training methods, and experimental results, highlighting the importance of character-level and word representations, and dropout training for generalization.