1999 | DANIEL M. BIKEL, RICHARD SCHWARTZ, RALPH M. WEISCHEDHEL
This paper presents IdentiFinder™, a hidden Markov model that learns to recognize and classify names, dates, times, and numerical quantities. The model was evaluated in English and Spanish, and on speech input. Results show that IdentiFinder performs better than other learning algorithms and is competitive with rule-based approaches on mixed case text, and superior when case information is unavailable. A controlled experiment showed that as little as 100,000 words of training data is sufficient to achieve performance around 90% on newswire.
IdentiFinder uses a hidden Markov model to recognize named entities, with each word either part of a name or not. The model is trained on a variety of data, including text, speech, and OCR input. It uses a statistical bigram language model to compute the likelihood of words occurring within a name-class. The model is designed to handle multiple modalities, including mixed case, upper case, and speech formats.
The model was tested on English and Spanish data, with results showing that it performs well on both. The model was also tested on different modalities, including upper case text and speech formats, and was found to perform well even when case information is unavailable. The model was found to be efficient and effective, with performance comparable to handcrafted systems when trained on a small amount of data.
The paper also discusses other learning approaches to name-finding, including transformation-based learning and decision trees. These approaches were found to be less effective than IdentiFinder. The paper concludes that IdentiFinder is a promising approach to named entity recognition, with performance comparable to handcrafted systems and superior when case information is unavailable.This paper presents IdentiFinder™, a hidden Markov model that learns to recognize and classify names, dates, times, and numerical quantities. The model was evaluated in English and Spanish, and on speech input. Results show that IdentiFinder performs better than other learning algorithms and is competitive with rule-based approaches on mixed case text, and superior when case information is unavailable. A controlled experiment showed that as little as 100,000 words of training data is sufficient to achieve performance around 90% on newswire.
IdentiFinder uses a hidden Markov model to recognize named entities, with each word either part of a name or not. The model is trained on a variety of data, including text, speech, and OCR input. It uses a statistical bigram language model to compute the likelihood of words occurring within a name-class. The model is designed to handle multiple modalities, including mixed case, upper case, and speech formats.
The model was tested on English and Spanish data, with results showing that it performs well on both. The model was also tested on different modalities, including upper case text and speech formats, and was found to perform well even when case information is unavailable. The model was found to be efficient and effective, with performance comparable to handcrafted systems when trained on a small amount of data.
The paper also discusses other learning approaches to name-finding, including transformation-based learning and decision trees. These approaches were found to be less effective than IdentiFinder. The paper concludes that IdentiFinder is a promising approach to named entity recognition, with performance comparable to handcrafted systems and superior when case information is unavailable.