[slides and audio] SSD%3A Single Shot MultiBox Detector

The paper introduces SSD (Single Shot MultiBox Detector), a method for object detection in images using a single deep neural network. SSD discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to better match the object shape. SSD combines predictions from multiple feature maps with different resolutions to handle objects of various sizes. The method eliminates the need for object proposals, making it simpler and faster compared to methods that require proposal generation. Experimental results on PASCAL VOC, COCO, and ILSVRC datasets show that SSD achieves competitive accuracy with methods that use object proposals but is significantly faster. For a $300 \times 300$ input, SSD achieves 74.3% mAP on VOC2007 test at 59 FPS, and for a $512 \times 512$ input, it achieves 76.9% mAP, outperforming a comparable state-of-the-art Faster R-CNN model. SSD is also more accurate even with smaller input image sizes compared to other single-stage methods. The code for SSD is available at <https://github.com/weiliu89/caffe/tree/ssd>.The paper introduces SSD (Single Shot MultiBox Detector), a method for object detection in images using a single deep neural network. SSD discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to better match the object shape. SSD combines predictions from multiple feature maps with different resolutions to handle objects of various sizes. The method eliminates the need for object proposals, making it simpler and faster compared to methods that require proposal generation. Experimental results on PASCAL VOC, COCO, and ILSVRC datasets show that SSD achieves competitive accuracy with methods that use object proposals but is significantly faster. For a $300 \times 300$ input, SSD achieves 74.3% mAP on VOC2007 test at 59 FPS, and for a $512 \times 512$ input, it achieves 76.9% mAP, outperforming a comparable state-of-the-art Faster R-CNN model. SSD is also more accurate even with smaller input image sizes compared to other single-stage methods. The code for SSD is available at <https://github.com/weiliu89/caffe/tree/ssd>.

SSD: Single Shot MultiBox Detector

2016 | Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg