| Andrew McCallum, Dayne Freitag, Fernando Pereira
This paper introduces Maximum Entropy Markov Models (MEMMs) as a new sequence model for information extraction and segmentation. Unlike traditional Hidden Markov Models (HMMs), MEMMs allow observations to be represented as arbitrary overlapping features, such as word, capitalization, formatting, part-of-speech, and more. The model defines the conditional probability of state sequences given observation sequences by using the maximum entropy framework to fit exponential models that represent the probability of a state given an observation and the previous state. The parameters are trained using generalized iterative scaling (GIS), which is similar in form and computational cost to the expectation-maximization (EM) algorithm.
MEMMs address two key issues with traditional HMMs: first, they allow for non-independent, difficult-to-enumerate observation features, and second, they solve the conditional problem of predicting the state sequence given the observation sequence, rather than maximizing the likelihood of the observation sequence. The model is a conditional model that represents the probability of reaching a state given an observation and the previous state. These conditional probabilities are specified by exponential models based on arbitrary observation features.
The paper presents positive experimental results on the segmentation of FAQ's, showing that MEMMs increase both precision and recall, with precision improved by a factor of two. The model is evaluated on a dataset of 38 files from 7 Usenet multi-part FAQs, where each line is labeled as head, question, answer, or tail. The results show that MEMMs outperform other models, including traditional HMMs and stateless maximum entropy models, in segmenting FAQs into questions and answers. The paper also discusses related work, including other probabilistic models and non-probabilistic methods, and concludes that MEMMs offer a more effective approach for text-related tasks such as information extraction and segmentation.This paper introduces Maximum Entropy Markov Models (MEMMs) as a new sequence model for information extraction and segmentation. Unlike traditional Hidden Markov Models (HMMs), MEMMs allow observations to be represented as arbitrary overlapping features, such as word, capitalization, formatting, part-of-speech, and more. The model defines the conditional probability of state sequences given observation sequences by using the maximum entropy framework to fit exponential models that represent the probability of a state given an observation and the previous state. The parameters are trained using generalized iterative scaling (GIS), which is similar in form and computational cost to the expectation-maximization (EM) algorithm.
MEMMs address two key issues with traditional HMMs: first, they allow for non-independent, difficult-to-enumerate observation features, and second, they solve the conditional problem of predicting the state sequence given the observation sequence, rather than maximizing the likelihood of the observation sequence. The model is a conditional model that represents the probability of reaching a state given an observation and the previous state. These conditional probabilities are specified by exponential models based on arbitrary observation features.
The paper presents positive experimental results on the segmentation of FAQ's, showing that MEMMs increase both precision and recall, with precision improved by a factor of two. The model is evaluated on a dataset of 38 files from 7 Usenet multi-part FAQs, where each line is labeled as head, question, answer, or tail. The results show that MEMMs outperform other models, including traditional HMMs and stateless maximum entropy models, in segmenting FAQs into questions and answers. The paper also discusses related work, including other probabilistic models and non-probabilistic methods, and concludes that MEMMs offer a more effective approach for text-related tasks such as information extraction and segmentation.