6 Nov 2015 | Andrew Gordon Wilson*, Zhiting Hu*, Ruslan Salakhutdinov, Eric P. Xing
The paper introduces scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, it transforms the inputs of a spectral mixture base kernel using a deep architecture, leveraging local kernel interpolation, inducing points, and structure-exploiting algebra (Kronecker and Toeplitz) for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, offering benefits in expressive power and scalability. The properties of these kernels are jointly learned through the marginal likelihood of a Gaussian process, achieving $\mathcal{O}(n)$ inference and $\mathcal{O}(1)$ prediction costs for $n$ training points. The approach is evaluated on a wide range of datasets, demonstrating improved performance over scalable Gaussian processes with flexible kernel learning models and stand-alone deep architectures. The paper also discusses the scalability and efficiency of the method, showing that it can handle large datasets effectively.The paper introduces scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, it transforms the inputs of a spectral mixture base kernel using a deep architecture, leveraging local kernel interpolation, inducing points, and structure-exploiting algebra (Kronecker and Toeplitz) for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, offering benefits in expressive power and scalability. The properties of these kernels are jointly learned through the marginal likelihood of a Gaussian process, achieving $\mathcal{O}(n)$ inference and $\mathcal{O}(1)$ prediction costs for $n$ training points. The approach is evaluated on a wide range of datasets, demonstrating improved performance over scalable Gaussian processes with flexible kernel learning models and stand-alone deep architectures. The paper also discusses the scalability and efficiency of the method, showing that it can handle large datasets effectively.