1 Mar 2010 | Dan Claudiu Cireșan, Ueli Meier, Luca Maria Gambardella, Jürgen Schmidhuber
Deep neural networks, specifically multi-layer perceptrons (MLPs), excel in recognizing handwritten digits, particularly on the MNIST benchmark. This study demonstrates that using many hidden layers, numerous neurons per layer, and deformed training images, along with graphics processing units (GPUs), can achieve a very low error rate of 0.35% on MNIST. Traditional back-propagation (BP) training on large MLPs is challenging due to the exponential vanishing of gradients with more layers. However, by using GPUs, which can significantly speed up training, this problem is overcome. The study also shows that deforming training images increases the number of training examples, making the network more robust to variations in the input. The results show that the best MLP with 12.11 million parameters achieves a test error of 0.35%, surpassing previous results. The study also highlights the importance of GPU acceleration in training deep MLPs, as it allows for faster training times and better performance. The paper concludes that GPUs are more efficient than CPUs for training deep neural networks, and that the approach can be applied to other pattern recognition tasks.Deep neural networks, specifically multi-layer perceptrons (MLPs), excel in recognizing handwritten digits, particularly on the MNIST benchmark. This study demonstrates that using many hidden layers, numerous neurons per layer, and deformed training images, along with graphics processing units (GPUs), can achieve a very low error rate of 0.35% on MNIST. Traditional back-propagation (BP) training on large MLPs is challenging due to the exponential vanishing of gradients with more layers. However, by using GPUs, which can significantly speed up training, this problem is overcome. The study also shows that deforming training images increases the number of training examples, making the network more robust to variations in the input. The results show that the best MLP with 12.11 million parameters achieves a test error of 0.35%, surpassing previous results. The study also highlights the importance of GPU acceleration in training deep MLPs, as it allows for faster training times and better performance. The paper concludes that GPUs are more efficient than CPUs for training deep neural networks, and that the approach can be applied to other pattern recognition tasks.