1996 | Christopher K. I. Williams, Carl Edward Rasmussen
The paper "Gaussian Processes for Regression" by Christopher K. I. Williams and Carl Edward Rasmussen explores the use of Gaussian processes in Bayesian regression. The authors address the complexity of Bayesian analysis in neural networks, where a simple prior over weights leads to a complex prior distribution over functions. They propose using Gaussian process priors, which allow for exact Bayesian analysis using matrix operations. Two methods, optimization and averaging via Hybrid Monte Carlo (HMC), are tested on challenging problems and show excellent results.
The introduction explains that the Bayesian approach to neural networks involves combining a prior distribution over weights with a noise model to form a posterior distribution over functions. The authors argue that neural network models should not be limited to a small number of hidden units and that large networks can converge to a Gaussian process prior. They parameterize the Gaussian process using hyperparameters, which can be estimated from data using maximum likelihood or Bayesian methods, leading to Automatic Relevance Determination (ARD).
The paper details the prediction process using Gaussian processes, including the calculation of the predictive distribution for test cases. It also introduces a covariance function that captures the correlation between nearby inputs and allows for the detection of irrelevant inputs. The authors discuss the relationship between Gaussian processes and other regression methods, such as ARMA models and spline smoothing.
Training a Gaussian process involves adjusting hyperparameters to maximize the likelihood of the training data. They present two methods: maximum likelihood and HMC. The HMC method uses a dynamical system and Gibbs sampling to sample from the posterior distribution of hyperparameters.
Experimental results are reported on a modified robot arm problem and five real-world datasets, showing that Gaussian processes perform well compared to other regression algorithms. The authors also discuss future directions, including classification problems, non-stationary covariance functions, and more complex covariance functions.The paper "Gaussian Processes for Regression" by Christopher K. I. Williams and Carl Edward Rasmussen explores the use of Gaussian processes in Bayesian regression. The authors address the complexity of Bayesian analysis in neural networks, where a simple prior over weights leads to a complex prior distribution over functions. They propose using Gaussian process priors, which allow for exact Bayesian analysis using matrix operations. Two methods, optimization and averaging via Hybrid Monte Carlo (HMC), are tested on challenging problems and show excellent results.
The introduction explains that the Bayesian approach to neural networks involves combining a prior distribution over weights with a noise model to form a posterior distribution over functions. The authors argue that neural network models should not be limited to a small number of hidden units and that large networks can converge to a Gaussian process prior. They parameterize the Gaussian process using hyperparameters, which can be estimated from data using maximum likelihood or Bayesian methods, leading to Automatic Relevance Determination (ARD).
The paper details the prediction process using Gaussian processes, including the calculation of the predictive distribution for test cases. It also introduces a covariance function that captures the correlation between nearby inputs and allows for the detection of irrelevant inputs. The authors discuss the relationship between Gaussian processes and other regression methods, such as ARMA models and spline smoothing.
Training a Gaussian process involves adjusting hyperparameters to maximize the likelihood of the training data. They present two methods: maximum likelihood and HMC. The HMC method uses a dynamical system and Gibbs sampling to sample from the posterior distribution of hyperparameters.
Experimental results are reported on a modified robot arm problem and five real-world datasets, showing that Gaussian processes perform well compared to other regression algorithms. The authors also discuss future directions, including classification problems, non-stationary covariance functions, and more complex covariance functions.