JUNE 2017 | Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton
This paper presents a deep convolutional neural network (CNN) that achieves state-of-the-art results in image classification on the ImageNet dataset. The network, with 60 million parameters and 650,000 neurons, consists of five convolutional layers, some followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax. The network uses non-saturating neurons and efficient GPU implementation of convolution operations to speed up training. To reduce overfitting, the authors employ a regularization method called "dropout." The model achieves top-1 and top-5 error rates of 37.5% and 17.0% on the ImageNet LSVRC-2010 test set, significantly better than previous methods. The model was also entered in the ILSVRC-2012 competition, achieving a winning top-5 error rate of 15.3%.
The paper discusses the evolution of deep neural networks, highlighting the importance of large datasets and computational power in training deep networks. It describes the architecture of the CNN, including the use of Rectified Linear Units (ReLUs), training on multiple GPUs, local response normalization, and overlapping pooling. The network's architecture is designed to handle the complexity of object recognition tasks, with a focus on reducing overfitting through data augmentation and dropout. The paper also details the training process, including the use of stochastic gradient descent with momentum and weight decay. The results show that the CNN outperforms previous methods in image classification tasks, demonstrating the effectiveness of deep learning in computer vision. The paper concludes with a discussion on the future of deep learning, emphasizing the potential of large and deep convolutional networks in video processing and other applications.This paper presents a deep convolutional neural network (CNN) that achieves state-of-the-art results in image classification on the ImageNet dataset. The network, with 60 million parameters and 650,000 neurons, consists of five convolutional layers, some followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax. The network uses non-saturating neurons and efficient GPU implementation of convolution operations to speed up training. To reduce overfitting, the authors employ a regularization method called "dropout." The model achieves top-1 and top-5 error rates of 37.5% and 17.0% on the ImageNet LSVRC-2010 test set, significantly better than previous methods. The model was also entered in the ILSVRC-2012 competition, achieving a winning top-5 error rate of 15.3%.
The paper discusses the evolution of deep neural networks, highlighting the importance of large datasets and computational power in training deep networks. It describes the architecture of the CNN, including the use of Rectified Linear Units (ReLUs), training on multiple GPUs, local response normalization, and overlapping pooling. The network's architecture is designed to handle the complexity of object recognition tasks, with a focus on reducing overfitting through data augmentation and dropout. The paper also details the training process, including the use of stochastic gradient descent with momentum and weight decay. The results show that the CNN outperforms previous methods in image classification tasks, demonstrating the effectiveness of deep learning in computer vision. The paper concludes with a discussion on the future of deep learning, emphasizing the potential of large and deep convolutional networks in video processing and other applications.