25 Jul 2016 | Zhaowei Cai, Quanfu Fan, Rogerio S. Feris, and Nuno Vasconcelos
The paper introduces a unified multi-scale deep convolutional neural network (MS-CNN) for fast multi-scale object detection. The MS-CNN consists of a proposal sub-network and a detection sub-network, both learned end-to-end using a multi-task loss. The proposal sub-network performs detection at multiple output layers to match objects of different scales, while the detection sub-network uses a ROI pooling layer and a deconvolution layer to improve detection accuracy. The MS-CNN achieves state-of-the-art performance on datasets like KITTI and Caltech, with recall rates over 95% for small objects and speeds up to 15 fps. The use of feature upsampling through deconvolution reduces memory and computation costs, making the MS-CNN efficient and effective for real-time object detection.The paper introduces a unified multi-scale deep convolutional neural network (MS-CNN) for fast multi-scale object detection. The MS-CNN consists of a proposal sub-network and a detection sub-network, both learned end-to-end using a multi-task loss. The proposal sub-network performs detection at multiple output layers to match objects of different scales, while the detection sub-network uses a ROI pooling layer and a deconvolution layer to improve detection accuracy. The MS-CNN achieves state-of-the-art performance on datasets like KITTI and Caltech, with recall rates over 95% for small objects and speeds up to 15 fps. The use of feature upsampling through deconvolution reduces memory and computation costs, making the MS-CNN efficient and effective for real-time object detection.