May, 1995 | Stephen Della Pietra† Vincent Della Pietra† John Lafferty‡
This paper presents a method for incrementally constructing random fields from training data. The approach builds increasingly complex fields by allowing potential functions (features) supported by larger subgraphs. Each feature has a weight trained to minimize the Kullback-Leibler divergence between the model and the empirical distribution of the training data. A greedy algorithm selects features incrementally, and an iterative scaling algorithm estimates the optimal weights.
The method differs from traditional computer vision approaches in that the random fields are non-Markovian and have many parameters to estimate. It relates to other learning methods like decision trees and Boltzmann machines. The method is demonstrated in natural language processing for automatic word classification.
The algorithm incrementally constructs random fields by adding features that improve the model's fit to the data. The feature selection step evaluates the gain of each candidate feature, which is the improvement in the Kullback-Leibler divergence when the feature is added. The parameter estimation step uses an improved iterative scaling algorithm to adjust the weights of the features.
The paper discusses the form of the random field models and the general learning algorithm. It presents the feature selection step and addresses cases where Monte Carlo methods are needed to estimate the equations. The improved iterative scaling algorithm is introduced and proven to converge. The application of the method to word morphology is described, where the algorithm learns features that characterize word spellings. The method is shown to effectively capture the structure of word spellings, including common patterns and regular expressions. The algorithm is also shown to handle large configuration spaces using Monte Carlo techniques. The paper concludes with a discussion of the relationship between the method and other learning approaches, as well as possible extensions.This paper presents a method for incrementally constructing random fields from training data. The approach builds increasingly complex fields by allowing potential functions (features) supported by larger subgraphs. Each feature has a weight trained to minimize the Kullback-Leibler divergence between the model and the empirical distribution of the training data. A greedy algorithm selects features incrementally, and an iterative scaling algorithm estimates the optimal weights.
The method differs from traditional computer vision approaches in that the random fields are non-Markovian and have many parameters to estimate. It relates to other learning methods like decision trees and Boltzmann machines. The method is demonstrated in natural language processing for automatic word classification.
The algorithm incrementally constructs random fields by adding features that improve the model's fit to the data. The feature selection step evaluates the gain of each candidate feature, which is the improvement in the Kullback-Leibler divergence when the feature is added. The parameter estimation step uses an improved iterative scaling algorithm to adjust the weights of the features.
The paper discusses the form of the random field models and the general learning algorithm. It presents the feature selection step and addresses cases where Monte Carlo methods are needed to estimate the equations. The improved iterative scaling algorithm is introduced and proven to converge. The application of the method to word morphology is described, where the algorithm learns features that characterize word spellings. The method is shown to effectively capture the structure of word spellings, including common patterns and regular expressions. The algorithm is also shown to handle large configuration spaces using Monte Carlo techniques. The paper concludes with a discussion of the relationship between the method and other learning approaches, as well as possible extensions.