[slides] Do Deep Nets Really Need to be Deep%3F

Deep neural networks (DNNs) are currently state-of-the-art in tasks like speech recognition and computer vision. This paper shows that shallow feed-forward neural networks can learn complex functions previously learned by deep networks and achieve similar or higher accuracy with the same number of parameters. On TIMIT and CIFAR-10 tasks, shallow networks can match the performance of deep models. The paper explores why deep networks outperform shallow ones. It suggests that the improvement may come from more parameters, the ability to learn complex functions, hierarchical representations, or better learning algorithms. However, the study shows that shallow networks can mimic deep networks when trained on compressed data, indicating that deep networks may not necessarily need to be deep. The key technique is model compression, where a shallow network is trained to mimic a deep network. This is done by using unlabeled data to train the shallow model on the outputs (logits) of the deep model, rather than the original labels. This approach allows the shallow model to learn the function of the deep model without needing as many parameters. The paper also introduces a linear bottleneck layer to speed up training by reducing the size of the weight matrix. This allows the shallow model to learn faster and with less memory. Experiments on TIMIT and CIFAR-10 show that shallow models can match or exceed the performance of deep models when trained on compressed data. For example, a shallow model with 8,000 hidden units can match a deep model with similar parameters, and a model with 400,000 hidden units can match a deep convolutional network. The study suggests that the complexity of a learned function and the size of the representation needed to learn it are not necessarily linked. Shallow models can learn complex functions if trained properly, and the performance gap between shallow and deep models may be reduced with better regularization techniques. The paper concludes that deep learning may benefit from a good match between deep architectures and training procedures, and that better algorithms could allow shallow models to achieve higher accuracy. Shallow models can be trained to mimic deep models, suggesting that depth may not always be essential for learning complex functions.Deep neural networks (DNNs) are currently state-of-the-art in tasks like speech recognition and computer vision. This paper shows that shallow feed-forward neural networks can learn complex functions previously learned by deep networks and achieve similar or higher accuracy with the same number of parameters. On TIMIT and CIFAR-10 tasks, shallow networks can match the performance of deep models. The paper explores why deep networks outperform shallow ones. It suggests that the improvement may come from more parameters, the ability to learn complex functions, hierarchical representations, or better learning algorithms. However, the study shows that shallow networks can mimic deep networks when trained on compressed data, indicating that deep networks may not necessarily need to be deep. The key technique is model compression, where a shallow network is trained to mimic a deep network. This is done by using unlabeled data to train the shallow model on the outputs (logits) of the deep model, rather than the original labels. This approach allows the shallow model to learn the function of the deep model without needing as many parameters. The paper also introduces a linear bottleneck layer to speed up training by reducing the size of the weight matrix. This allows the shallow model to learn faster and with less memory. Experiments on TIMIT and CIFAR-10 show that shallow models can match or exceed the performance of deep models when trained on compressed data. For example, a shallow model with 8,000 hidden units can match a deep model with similar parameters, and a model with 400,000 hidden units can match a deep convolutional network. The study suggests that the complexity of a learned function and the size of the representation needed to learn it are not necessarily linked. Shallow models can learn complex functions if trained properly, and the performance gap between shallow and deep models may be reduced with better regularization techniques. The paper concludes that deep learning may benefit from a good match between deep architectures and training procedures, and that better algorithms could allow shallow models to achieve higher accuracy. Shallow models can be trained to mimic deep models, suggesting that depth may not always be essential for learning complex functions.

Do Deep Nets Really Need to be Deep?

11 Oct 2014 | Lei Jimmy Ba, Rich Caruana