DSSD : Deconvolutional Single Shot Detector

DSSD : Deconvolutional Single Shot Detector

23 Jan 2017 | Cheng-Yang Fu1*, Wei Liu1*, Ananth Ranga2, Ambrish Tyagi2, Alexander C. Berg1
This paper introduces DSSD, a deconvolutional single shot detector, which improves the accuracy of state-of-the-art object detection by adding additional context. The approach combines a state-of-the-art classifier (Residual-101) with a fast detection framework (SSD), and augments SSD+Residual-101 with deconvolution layers to introduce additional large-scale context in object detection and improve accuracy, especially for small objects. The resulting system is called DSSD. The paper shows that carefully adding additional stages of learned transformations, specifically a module for feed-forward connections in deconvolution and a new output module, enables this new approach and forms a potential way forward for further detection research. Results are shown on both PASCAL VOC and COCO detection. Our DSSD with 513×513 input achieves 81.5% mAP on VOC2007 test, 80.0% mAP on VOC2012 test, and 33.2% mAP on COCO, outperforming a state-of-the-art method R-FCN on each dataset. The paper also discusses related work, including other object detection methods and the use of encoder-decoder networks for context integration. The DSSD model is built on SSD with Residual-101 and includes deconvolution layers to increase feature map resolution and inject more semantic information. The model is trained with a combination of different prediction modules and deconvolution modules, and the results show that DSSD outperforms other methods on both VOC and COCO datasets. The paper also discusses the inference time and visualization results, showing that DSSD achieves state-of-the-art accuracy while maintaining a reasonable speed compared to other detectors.This paper introduces DSSD, a deconvolutional single shot detector, which improves the accuracy of state-of-the-art object detection by adding additional context. The approach combines a state-of-the-art classifier (Residual-101) with a fast detection framework (SSD), and augments SSD+Residual-101 with deconvolution layers to introduce additional large-scale context in object detection and improve accuracy, especially for small objects. The resulting system is called DSSD. The paper shows that carefully adding additional stages of learned transformations, specifically a module for feed-forward connections in deconvolution and a new output module, enables this new approach and forms a potential way forward for further detection research. Results are shown on both PASCAL VOC and COCO detection. Our DSSD with 513×513 input achieves 81.5% mAP on VOC2007 test, 80.0% mAP on VOC2012 test, and 33.2% mAP on COCO, outperforming a state-of-the-art method R-FCN on each dataset. The paper also discusses related work, including other object detection methods and the use of encoder-decoder networks for context integration. The DSSD model is built on SSD with Residual-101 and includes deconvolution layers to increase feature map resolution and inject more semantic information. The model is trained with a combination of different prediction modules and deconvolution modules, and the results show that DSSD outperforms other methods on both VOC and COCO datasets. The paper also discusses the inference time and visualization results, showing that DSSD achieves state-of-the-art accuracy while maintaining a reasonable speed compared to other detectors.
Reach us at info@study.space