Neural Architectures for Named Entity Recognition

Neural Architectures for Named Entity Recognition

7 Apr 2016 | Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer
This paper introduces two neural architectures for Named Entity Recognition (NER): one based on bidirectional LSTMs with conditional random fields (LSTM-CRF) and another using a transition-based approach inspired by shift-reduce parsers. The models use character-based and word-based representations, combining both orthographic and distributional information. They achieve state-of-the-art performance in four languages without relying on language-specific resources like gazetteers. The LSTM-CRF model uses bidirectional LSTMs to capture contextual information and a CRF layer to model dependencies between tags. The transition-based model constructs and labels input sentences using a stack-based approach, similar to dependency parsing. Both models use character-level embeddings and pretrained word embeddings, with dropout training to balance reliance on different representations. Experiments on English, Dutch, German, and Spanish show that the LSTM-CRF model achieves state-of-the-art results, while the transition-based model also performs well. The models outperform previous approaches by leveraging both orthographic and distributional evidence, and they are effective without requiring external resources or hand-engineered features. The paper also discusses the importance of word representations, showing that combining character-based and pretrained word embeddings improves performance. Dropout training is crucial for generalization, and the models are trained using stochastic gradient descent with gradient clipping. The results demonstrate that the proposed architectures achieve high accuracy in NER across multiple languages.This paper introduces two neural architectures for Named Entity Recognition (NER): one based on bidirectional LSTMs with conditional random fields (LSTM-CRF) and another using a transition-based approach inspired by shift-reduce parsers. The models use character-based and word-based representations, combining both orthographic and distributional information. They achieve state-of-the-art performance in four languages without relying on language-specific resources like gazetteers. The LSTM-CRF model uses bidirectional LSTMs to capture contextual information and a CRF layer to model dependencies between tags. The transition-based model constructs and labels input sentences using a stack-based approach, similar to dependency parsing. Both models use character-level embeddings and pretrained word embeddings, with dropout training to balance reliance on different representations. Experiments on English, Dutch, German, and Spanish show that the LSTM-CRF model achieves state-of-the-art results, while the transition-based model also performs well. The models outperform previous approaches by leveraging both orthographic and distributional evidence, and they are effective without requiring external resources or hand-engineered features. The paper also discusses the importance of word representations, showing that combining character-based and pretrained word embeddings improves performance. Dropout training is crucial for generalization, and the models are trained using stochastic gradient descent with gradient clipping. The results demonstrate that the proposed architectures achieve high accuracy in NER across multiple languages.
Reach us at info@study.space