[slides and audio] Rethinking the Inception Architecture for Computer Vision

This paper explores methods to scale up convolutional neural networks (CNNs) while maintaining computational efficiency and low parameter counts, particularly focusing on the Inception architecture. The authors propose several design principles and techniques to improve network performance and efficiency, including factorizing convolutions, using smaller filter sizes, and aggressive regularization. They benchmark their methods on the ILSVRC 2012 classification challenge validation set, achieving substantial gains over state-of-the-art models. The proposed architecture, Inception-v2, reduces computational cost by 2.5 times compared to GoogLeNet while maintaining high accuracy. The authors also introduce label smoothing as a regularization technique to prevent overfitting and improve adaptability. Experimental results show that the Inception-v2 model achieves 21.2% top-1 and 5.6% top-5 error for single-frame evaluation, outperforming previous models with a 25% and 14% reduction in top-5 and top-1 error, respectively. Additionally, the authors demonstrate that high-quality results can be achieved with low-resolution inputs, making the model suitable for detecting small objects.This paper explores methods to scale up convolutional neural networks (CNNs) while maintaining computational efficiency and low parameter counts, particularly focusing on the Inception architecture. The authors propose several design principles and techniques to improve network performance and efficiency, including factorizing convolutions, using smaller filter sizes, and aggressive regularization. They benchmark their methods on the ILSVRC 2012 classification challenge validation set, achieving substantial gains over state-of-the-art models. The proposed architecture, Inception-v2, reduces computational cost by 2.5 times compared to GoogLeNet while maintaining high accuracy. The authors also introduce label smoothing as a regularization technique to prevent overfitting and improve adaptability. Experimental results show that the Inception-v2 model achieves 21.2% top-1 and 5.6% top-5 error for single-frame evaluation, outperforming previous models with a 25% and 14% reduction in top-5 and top-1 error, respectively. Additionally, the authors demonstrate that high-quality results can be achieved with low-resolution inputs, making the model suitable for detecting small objects.

Rethinking the Inception Architecture for Computer Vision

11 Dec 2015 | Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna