Deep Neural Networks as Gaussian Processes

Deep Neural Networks as Gaussian Processes

3 Mar 2018 | Jaehoon Lee*, Yasaman Bahri*, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein
Deep neural networks (DNNs) and Gaussian processes (GPs) are shown to be equivalent in the limit of infinite width. This equivalence allows for exact Bayesian inference in regression tasks using GPs. The paper derives the exact correspondence between infinitely wide DNNs and GPs, develops a computationally efficient method to compute the covariance function for these GPs, and applies this to perform Bayesian inference on MNIST and CIFAR-10 datasets. The results show that trained neural networks' accuracy approaches that of the corresponding GP with increasing layer width, and that GP uncertainty is strongly correlated with prediction error. The performance of GPs improves as finite-width networks become wider and more similar to a GP, suggesting that GP predictions often outperform those of finite-width networks. The paper also connects the performance of these GPs to recent theories of signal propagation in random neural networks. The work provides a framework for Bayesian training of neural networks using GPs, enabling exact inference and uncertainty estimation without stochastic gradient-based training. The results demonstrate that the NNGP often outperforms finite-width networks, with performance closely resembling that of wider networks. The Bayesian nature of GPs allows for explicit uncertainty estimates, which are highly correlated with prediction error. The paper also explores the relationship between deep signal propagation and the convergence of the kernel function to a fixed point, showing that the performance of the NNGP aligns with the phase diagram of deep signal propagation. The results highlight the potential of GPs in providing principled hyperparameter selection and nonlinearity design for deep learning.Deep neural networks (DNNs) and Gaussian processes (GPs) are shown to be equivalent in the limit of infinite width. This equivalence allows for exact Bayesian inference in regression tasks using GPs. The paper derives the exact correspondence between infinitely wide DNNs and GPs, develops a computationally efficient method to compute the covariance function for these GPs, and applies this to perform Bayesian inference on MNIST and CIFAR-10 datasets. The results show that trained neural networks' accuracy approaches that of the corresponding GP with increasing layer width, and that GP uncertainty is strongly correlated with prediction error. The performance of GPs improves as finite-width networks become wider and more similar to a GP, suggesting that GP predictions often outperform those of finite-width networks. The paper also connects the performance of these GPs to recent theories of signal propagation in random neural networks. The work provides a framework for Bayesian training of neural networks using GPs, enabling exact inference and uncertainty estimation without stochastic gradient-based training. The results demonstrate that the NNGP often outperforms finite-width networks, with performance closely resembling that of wider networks. The Bayesian nature of GPs allows for explicit uncertainty estimates, which are highly correlated with prediction error. The paper also explores the relationship between deep signal propagation and the convergence of the kernel function to a fixed point, showing that the performance of the NNGP aligns with the phase diagram of deep signal propagation. The results highlight the potential of GPs in providing principled hyperparameter selection and nonlinearity design for deep learning.
Reach us at info@study.space