[slides] Predicting Parameters in Deep Learning

The paper "Predicting Parameters in Deep Learning" by Misha Denil et al. explores the redundancy in parameterization of deep learning models and demonstrates that a significant portion of the parameters can be accurately predicted based on a few initial weight values. The authors show that this prediction can be used to train models with fewer parameters without a loss in accuracy. They propose a technique to reduce the number of free parameters by representing the weight matrix as a low-rank product of two smaller matrices, which allows for controlling the parameterization size. The method is applied to various models, including convolutional neural networks (CNNs) and multilayer perceptrons (MLPs), and shows substantial savings in parameters while maintaining or improving performance. The paper also discusses the interpretation of the technique as a linear pooling process and the construction of dictionaries for different layers of the network. Experimental results on datasets like MNIST, TIMIT, CIFAR-10, and STL-10 validate the effectiveness of the approach, demonstrating that predicting more than 95% of the parameters can be achieved without a significant drop in accuracy.The paper "Predicting Parameters in Deep Learning" by Misha Denil et al. explores the redundancy in parameterization of deep learning models and demonstrates that a significant portion of the parameters can be accurately predicted based on a few initial weight values. The authors show that this prediction can be used to train models with fewer parameters without a loss in accuracy. They propose a technique to reduce the number of free parameters by representing the weight matrix as a low-rank product of two smaller matrices, which allows for controlling the parameterization size. The method is applied to various models, including convolutional neural networks (CNNs) and multilayer perceptrons (MLPs), and shows substantial savings in parameters while maintaining or improving performance. The paper also discusses the interpretation of the technique as a linear pooling process and the construction of dictionaries for different layers of the network. Experimental results on datasets like MNIST, TIMIT, CIFAR-10, and STL-10 validate the effectiveness of the approach, demonstrating that predicting more than 95% of the parameters can be achieved without a significant drop in accuracy.

Predicting Parameters in Deep Learning

27 Oct 2014 | Misha Denil, Babak Shakibi, Laurent Dinh, Marc'Aurelio Ranzato, Nando de Freitas