1998 | Christopher K.I. Williams, Member, IEEE Computer Society, and David Barber
The paper discusses the application of Gaussian Processes (GPs) to classification problems, extending the method from regression. The authors propose a fully Bayesian approach to predict the probability of an input vector belonging to one of \( m \) classes by estimating \( P(c|\mathbf{x}) \) for \( c = 1, \ldots, m \). For a two-class problem, the probability of class 1 given \( \mathbf{x} \) is estimated using the logistic function \( \sigma(y|\mathbf{x}) \), where \( y \) is the activation before the sigmoid function. The activation \( y(\mathbf{x}) \) is assumed to follow a Gaussian process prior, which is combined with training data to predict new \( \mathbf{x} \) points. The method is generalized to multi-class problems using the softmax function.
The paper provides a Bayesian treatment, integrating over uncertainty in \( y \) and the parameters controlling the Gaussian process prior. The necessary integration over \( y \) is approximated using Laplace's method. The authors also introduce a specific covariance function for the Gaussian process prior, which they argue is useful for modeling purposes.
The method is evaluated on several datasets, demonstrating its effectiveness. The paper discusses the computational aspects and compares the results with other methods, including maximum penalized likelihood estimation and Neal's MCMC method. The authors highlight the interpretability of the Gaussian process prior and its advantages over models with priors on parameter space. They also discuss similarities between GP classifiers and support-vector machines (SVMs) and suggest future research directions, such as improving computational efficiency and exploring different covariance functions.The paper discusses the application of Gaussian Processes (GPs) to classification problems, extending the method from regression. The authors propose a fully Bayesian approach to predict the probability of an input vector belonging to one of \( m \) classes by estimating \( P(c|\mathbf{x}) \) for \( c = 1, \ldots, m \). For a two-class problem, the probability of class 1 given \( \mathbf{x} \) is estimated using the logistic function \( \sigma(y|\mathbf{x}) \), where \( y \) is the activation before the sigmoid function. The activation \( y(\mathbf{x}) \) is assumed to follow a Gaussian process prior, which is combined with training data to predict new \( \mathbf{x} \) points. The method is generalized to multi-class problems using the softmax function.
The paper provides a Bayesian treatment, integrating over uncertainty in \( y \) and the parameters controlling the Gaussian process prior. The necessary integration over \( y \) is approximated using Laplace's method. The authors also introduce a specific covariance function for the Gaussian process prior, which they argue is useful for modeling purposes.
The method is evaluated on several datasets, demonstrating its effectiveness. The paper discusses the computational aspects and compares the results with other methods, including maximum penalized likelihood estimation and Neal's MCMC method. The authors highlight the interpretability of the Gaussian process prior and its advantages over models with priors on parameter space. They also discuss similarities between GP classifiers and support-vector machines (SVMs) and suggest future research directions, such as improving computational efficiency and exploring different covariance functions.