Deep Kernel Learning

Deep Kernel Learning

6 Nov 2015 | Andrew Gordon Wilson*, Zhiting Hu*, Ruslan Salakhutdinov, Eric P. Xing
This paper introduces scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. The approach transforms inputs of a spectral mixture base kernel using a deep architecture, local kernel interpolation, inducing points, and structure exploiting algebra (Kronecker and Toeplitz) for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, offering benefits in expressive power and scalability. The model jointly learns kernel properties through the marginal likelihood of a Gaussian process, achieving inference and learning costs of O(n) for n training points and prediction costs of O(1) per test point. The method is shown to outperform scalable Gaussian processes with flexible kernel learning models and stand-alone deep architectures on a wide range of applications, including a dataset with 2 million examples. The paper discusses the challenges of Gaussian processes (GPs) and deep neural networks (DNNs), highlighting the need for combining their strengths. It proposes a framework that uses deep feedforward and convolutional networks with spectral mixture covariance functions, inducing points, and local kernel interpolation to create scalable, expressive closed-form kernels. The approach leverages the KISS-GP method for efficient kernel representation, enabling linear scaling with the number of training instances. The model's non-parametric nature allows for automatic calibration through the marginal likelihood, without the need for regularization or cross-validation. The paper evaluates the proposed method on various regression tasks, including UCI regression datasets, face orientation extraction, digit magnitude recovery, and step function recovery. Results show that the deep kernel learning (DKL) method significantly outperforms standard GPs, expressive kernel learning approaches, and deep neural networks. The DKL model achieves high accuracy with fast prediction times, demonstrating its practical significance. The method's scalability is validated through experiments showing linear scaling with the number of training instances and efficient computation for large datasets. The paper also highlights the benefits of using spectral mixture kernels, which provide better performance than RBF kernels by capturing complex structures in data. The results demonstrate the effectiveness of combining deep learning and kernel methods for scalable, expressive modeling.This paper introduces scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. The approach transforms inputs of a spectral mixture base kernel using a deep architecture, local kernel interpolation, inducing points, and structure exploiting algebra (Kronecker and Toeplitz) for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, offering benefits in expressive power and scalability. The model jointly learns kernel properties through the marginal likelihood of a Gaussian process, achieving inference and learning costs of O(n) for n training points and prediction costs of O(1) per test point. The method is shown to outperform scalable Gaussian processes with flexible kernel learning models and stand-alone deep architectures on a wide range of applications, including a dataset with 2 million examples. The paper discusses the challenges of Gaussian processes (GPs) and deep neural networks (DNNs), highlighting the need for combining their strengths. It proposes a framework that uses deep feedforward and convolutional networks with spectral mixture covariance functions, inducing points, and local kernel interpolation to create scalable, expressive closed-form kernels. The approach leverages the KISS-GP method for efficient kernel representation, enabling linear scaling with the number of training instances. The model's non-parametric nature allows for automatic calibration through the marginal likelihood, without the need for regularization or cross-validation. The paper evaluates the proposed method on various regression tasks, including UCI regression datasets, face orientation extraction, digit magnitude recovery, and step function recovery. Results show that the deep kernel learning (DKL) method significantly outperforms standard GPs, expressive kernel learning approaches, and deep neural networks. The DKL model achieves high accuracy with fast prediction times, demonstrating its practical significance. The method's scalability is validated through experiments showing linear scaling with the number of training instances and efficient computation for large datasets. The paper also highlights the benefits of using spectral mixture kernels, which provide better performance than RBF kernels by capturing complex structures in data. The results demonstrate the effectiveness of combining deep learning and kernel methods for scalable, expressive modeling.
Reach us at info@study.space