May, 1995 | Stephen Della Pietra† Vincent Della Pietra† John Lafferty‡
The paper presents a method for constructing random fields from training samples, aiming to approximate the empirical distribution of the data. The method incrementally builds increasingly complex fields by allowing potential functions (or features) supported by larger subgraphs. Each feature is assigned a weight that is trained to minimize the Kullback-Leibler divergence between the model and the empirical distribution. A greedy algorithm determines how features are added to the field, and an iterative scaling algorithm estimates the optimal weights. The random fields differ from those in computer vision literature in that they are non-Markovian and have a large number of parameters to estimate. The paper discusses the relationship between this method and other learning approaches, such as decision trees and Boltzmann machines. An application to automatic word classification in natural language processing is described, demonstrating the effectiveness of the method.The paper presents a method for constructing random fields from training samples, aiming to approximate the empirical distribution of the data. The method incrementally builds increasingly complex fields by allowing potential functions (or features) supported by larger subgraphs. Each feature is assigned a weight that is trained to minimize the Kullback-Leibler divergence between the model and the empirical distribution. A greedy algorithm determines how features are added to the field, and an iterative scaling algorithm estimates the optimal weights. The random fields differ from those in computer vision literature in that they are non-Markovian and have a large number of parameters to estimate. The paper discusses the relationship between this method and other learning approaches, such as decision trees and Boltzmann machines. An application to automatic word classification in natural language processing is described, demonstrating the effectiveness of the method.