This paper presents a study on the impact of convolutional network depth on image recognition accuracy in large-scale settings. The authors investigate the effectiveness of increasing network depth using a small (3x3) convolution filter architecture, finding that networks with 16-19 weight layers significantly outperform prior art. These findings were used to achieve first and second places in the ILSVRC 2014 classification and localization tasks, respectively. The models also generalize well to other datasets, achieving state-of-the-art results. The authors made their two best-performing ConvNet models publicly available for further research.
The paper introduces a ConvNet architecture with small (3x3) filters, which allows for deeper networks without increasing the number of parameters significantly. The architecture includes three fully connected layers, with the first two having 4096 channels and the third performing 1000-way classification. The network uses ReLU activation and avoids local response normalization, which was found not to improve performance on the ILSVRC dataset.
The authors evaluated their models on the ILSVRC 2012 dataset, finding that increasing network depth improves classification accuracy. They also tested their models on multiple scales and found that scale jittering during training improves performance. The models were further evaluated on multiple crops and found to perform better than dense evaluation. The best-performing model achieved 7.3% test error on the ILSVRC 2014 classification task.
The authors also evaluated their models on object localization tasks, achieving 25.3% error. They found that per-class regression outperformed single-class regression and that using multiple scales and combining predictions improved performance. The models were also tested on other datasets, including VOC-2007, VOC-2012, Caltech-101, and Caltech-256, where they achieved state-of-the-art results. The models outperformed previous approaches by more than 6% on VOC-2012 and by 8.6% on Caltech-256. The results demonstrate the effectiveness of deep convolutional networks in image recognition tasks.This paper presents a study on the impact of convolutional network depth on image recognition accuracy in large-scale settings. The authors investigate the effectiveness of increasing network depth using a small (3x3) convolution filter architecture, finding that networks with 16-19 weight layers significantly outperform prior art. These findings were used to achieve first and second places in the ILSVRC 2014 classification and localization tasks, respectively. The models also generalize well to other datasets, achieving state-of-the-art results. The authors made their two best-performing ConvNet models publicly available for further research.
The paper introduces a ConvNet architecture with small (3x3) filters, which allows for deeper networks without increasing the number of parameters significantly. The architecture includes three fully connected layers, with the first two having 4096 channels and the third performing 1000-way classification. The network uses ReLU activation and avoids local response normalization, which was found not to improve performance on the ILSVRC dataset.
The authors evaluated their models on the ILSVRC 2012 dataset, finding that increasing network depth improves classification accuracy. They also tested their models on multiple scales and found that scale jittering during training improves performance. The models were further evaluated on multiple crops and found to perform better than dense evaluation. The best-performing model achieved 7.3% test error on the ILSVRC 2014 classification task.
The authors also evaluated their models on object localization tasks, achieving 25.3% error. They found that per-class regression outperformed single-class regression and that using multiple scales and combining predictions improved performance. The models were also tested on other datasets, including VOC-2007, VOC-2012, Caltech-101, and Caltech-256, where they achieved state-of-the-art results. The models outperformed previous approaches by more than 6% on VOC-2012 and by 8.6% on Caltech-256. The results demonstrate the effectiveness of deep convolutional networks in image recognition tasks.