20 Dec 2015 | Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, Dmitry Vetrov
This paper addresses the computational and memory demands of deep neural networks, particularly focusing on fully-connected layers. The authors propose a method to compress the dense weight matrices of these layers using the Tensor Train (TT) format, which significantly reduces the number of parameters while preserving the layer's expressive power. For Very Deep VGG networks, the compression factor of the dense weight matrix can reach up to 200,000 times, leading to a total compression factor of 7 for the entire network. The TT-layer is designed to be compatible with existing training algorithms and backpropagation, and it allows for the use of much wider layers, potentially increasing the model's expressive power. Experimental results on datasets such as MNIST, CIFAR-10, and ImageNet demonstrate that the compressed networks achieve similar performance to their uncompressed counterparts but with a much smaller number of parameters. The paper also discusses the advantages of the TT-layer in terms of inference time and memory usage, making it suitable for real-time applications and mobile devices.This paper addresses the computational and memory demands of deep neural networks, particularly focusing on fully-connected layers. The authors propose a method to compress the dense weight matrices of these layers using the Tensor Train (TT) format, which significantly reduces the number of parameters while preserving the layer's expressive power. For Very Deep VGG networks, the compression factor of the dense weight matrix can reach up to 200,000 times, leading to a total compression factor of 7 for the entire network. The TT-layer is designed to be compatible with existing training algorithms and backpropagation, and it allows for the use of much wider layers, potentially increasing the model's expressive power. Experimental results on datasets such as MNIST, CIFAR-10, and ImageNet demonstrate that the compressed networks achieve similar performance to their uncompressed counterparts but with a much smaller number of parameters. The paper also discusses the advantages of the TT-layer in terms of inference time and memory usage, making it suitable for real-time applications and mobile devices.