4 Nov 2019 | Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang
This paper presents the first efficient exact algorithm for computing the Convolutional Neural Tangent Kernel (CNTK), which extends the Neural Tangent Kernel (NTK) concept to convolutional neural networks (CNNs). The algorithm is implemented on GPUs and achieves a 10% higher performance on CIFAR-10 compared to previous kernel-based methods, and only 6% lower than the performance of fully-trained finite-width CNNs (with batch normalization turned off). Theoretically, the paper provides the first non-asymptotic proof showing that a fully-trained sufficiently wide CNN is equivalent to kernel regression using the NTK.
The paper also discusses the connection between deep learning and Gaussian processes (GPs), noting that a single-layer neural network with random parameters in the infinite width limit behaves like a GP. This idea extends to deep and convolutional networks, where the NTK captures the behavior of fully-trained infinite-width networks. The NTK is defined as the expected inner product of the gradients of the network's output with respect to its parameters. The CNTK is the analogous kernel for convolutional networks.
The paper shows that fully-trained infinite-width CNNs can achieve performance close to their finite-width counterparts, with a 11-layer CNN with global average pooling achieving 77% accuracy on CIFAR-10. The paper also demonstrates that random feature methods for approximating CNTK do not yield good approximations, as they perform much worse on CIFAR-10. The results suggest that the NTK captures the behavior of fully-trained wide networks under weaker conditions than previous proofs. The paper concludes that the CNTK provides a powerful tool for understanding the behavior of deep learning models in the infinite width limit.This paper presents the first efficient exact algorithm for computing the Convolutional Neural Tangent Kernel (CNTK), which extends the Neural Tangent Kernel (NTK) concept to convolutional neural networks (CNNs). The algorithm is implemented on GPUs and achieves a 10% higher performance on CIFAR-10 compared to previous kernel-based methods, and only 6% lower than the performance of fully-trained finite-width CNNs (with batch normalization turned off). Theoretically, the paper provides the first non-asymptotic proof showing that a fully-trained sufficiently wide CNN is equivalent to kernel regression using the NTK.
The paper also discusses the connection between deep learning and Gaussian processes (GPs), noting that a single-layer neural network with random parameters in the infinite width limit behaves like a GP. This idea extends to deep and convolutional networks, where the NTK captures the behavior of fully-trained infinite-width networks. The NTK is defined as the expected inner product of the gradients of the network's output with respect to its parameters. The CNTK is the analogous kernel for convolutional networks.
The paper shows that fully-trained infinite-width CNNs can achieve performance close to their finite-width counterparts, with a 11-layer CNN with global average pooling achieving 77% accuracy on CIFAR-10. The paper also demonstrates that random feature methods for approximating CNTK do not yield good approximations, as they perform much worse on CIFAR-10. The results suggest that the NTK captures the behavior of fully-trained wide networks under weaker conditions than previous proofs. The paper concludes that the CNTK provides a powerful tool for understanding the behavior of deep learning models in the infinite width limit.