This paper introduces stochastic variational inference (SVI) for Gaussian process (GP) models, enabling the application of GPs to large datasets with millions of data points. The key idea is to use inducing variables to decompose the GP model, allowing for efficient variational inference. The approach is extended to non-Gaussian likelihoods and latent variable models based on GPs. The paper demonstrates the method on a toy problem and two real-world datasets.
Gaussian processes are powerful for function inference but suffer from high computational complexity, $ \mathcal{O}(n^3) $, and storage, $ \mathcal{O}(n^2) $, for large datasets. Traditional approximate methods, such as partitioning data or using low-rank approximations, reduce complexity to $ \mathcal{O}(nm^2) $ and storage to $ \mathcal{O}(nm) $, but are still prohibitive for big data. The paper shows how SVI can be combined with inducing variables to develop a practical algorithm for fitting GPs.
The paper revisits the sparse GP approach of Titsias (2009), introducing a variational distribution for inducing variables and deriving a lower bound for the log-likelihood. This bound is then used to derive a stochastic variational inference algorithm. The key insight is that the bound can be written as a sum of terms, each corresponding to one input-output pair, enabling stochastic gradient methods.
The paper also discusses natural gradients for the variational distribution and shows how SVI can be applied to latent variable models. It demonstrates the method on a toy dataset and on real-world data, including UK apartment prices and airline delays. The results show that the method can handle large datasets efficiently, with computational complexity $ \mathcal{O}(m^3) $, where m is the number of inducing variables, allowing for much larger m than traditional sparse GPs.
The paper concludes that the proposed method enables the application of GP techniques to big data, with the ability to handle multiple outputs and other GP-based models. The method is implemented using the GPy toolkit and is shown to be effective on real-world datasets.This paper introduces stochastic variational inference (SVI) for Gaussian process (GP) models, enabling the application of GPs to large datasets with millions of data points. The key idea is to use inducing variables to decompose the GP model, allowing for efficient variational inference. The approach is extended to non-Gaussian likelihoods and latent variable models based on GPs. The paper demonstrates the method on a toy problem and two real-world datasets.
Gaussian processes are powerful for function inference but suffer from high computational complexity, $ \mathcal{O}(n^3) $, and storage, $ \mathcal{O}(n^2) $, for large datasets. Traditional approximate methods, such as partitioning data or using low-rank approximations, reduce complexity to $ \mathcal{O}(nm^2) $ and storage to $ \mathcal{O}(nm) $, but are still prohibitive for big data. The paper shows how SVI can be combined with inducing variables to develop a practical algorithm for fitting GPs.
The paper revisits the sparse GP approach of Titsias (2009), introducing a variational distribution for inducing variables and deriving a lower bound for the log-likelihood. This bound is then used to derive a stochastic variational inference algorithm. The key insight is that the bound can be written as a sum of terms, each corresponding to one input-output pair, enabling stochastic gradient methods.
The paper also discusses natural gradients for the variational distribution and shows how SVI can be applied to latent variable models. It demonstrates the method on a toy dataset and on real-world data, including UK apartment prices and airline delays. The results show that the method can handle large datasets efficiently, with computational complexity $ \mathcal{O}(m^3) $, where m is the number of inducing variables, allowing for much larger m than traditional sparse GPs.
The paper concludes that the proposed method enables the application of GP techniques to big data, with the ability to handle multiple outputs and other GP-based models. The method is implemented using the GPy toolkit and is shown to be effective on real-world datasets.