AUGUST 2007, VOL. 49, NO. 3 | Alexander Genkin, David D. Lewis, David Madigan
The paper presents a Bayesian approach to logistic regression for text categorization, addressing the computational and statistical challenges posed by high-dimensional data. The authors introduce a Laplace prior to avoid overfitting and produce sparse predictive models. They apply this approach to various document classification problems and show that it produces compact models that are at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. The key to their approach is the use of a prior probability distribution that favors sparseness in the fitted model, along with an optimization algorithm tailored to this prior. The paper describes the model fitting algorithm, open-source implementations (BBR and BMR), and experimental results, demonstrating the effectiveness of the proposed method in text categorization tasks.The paper presents a Bayesian approach to logistic regression for text categorization, addressing the computational and statistical challenges posed by high-dimensional data. The authors introduce a Laplace prior to avoid overfitting and produce sparse predictive models. They apply this approach to various document classification problems and show that it produces compact models that are at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. The key to their approach is the use of a prior probability distribution that favors sparseness in the fitted model, along with an optimization algorithm tailored to this prior. The paper describes the model fitting algorithm, open-source implementations (BBR and BMR), and experimental results, demonstrating the effectiveness of the proposed method in text categorization tasks.