16 Apr 2019 | Golnaz Ghaisi Tsung-Yi Lin Ruoming Pang Quoc V. Le
This paper proposes NAS-FPN, a scalable feature pyramid architecture for object detection, learned through neural architecture search (NAS). The proposed architecture combines top-down and bottom-up connections to fuse features across scales, achieving better accuracy and latency tradeoff compared to state-of-the-art models. NAS-FPN works well with various backbone models, including MobileNet, ResNet, and AmoebaNet. When combined with MobileNetV2 in the RetinaNet framework, it outperforms SSDLite by 2 AP. With AmoebaNet-D, it achieves 48.3 AP, surpassing Mask R-CNN with less computation time. The architecture is designed to be scalable and can be stacked multiple times for better accuracy. The method uses a modular search space to discover efficient cross-scale connections, enabling anytime detection (early exit). The NAS-FPN architecture is trained using a reinforcement learning controller to select optimal architectures in a search space. The controller learns to generate better architectures over time, and the discovered architecture is used for object detection. The method is evaluated on the COCO dataset, showing significant improvements in detection accuracy and efficiency. The results demonstrate that NAS-FPN is a flexible and effective architecture for object detection.This paper proposes NAS-FPN, a scalable feature pyramid architecture for object detection, learned through neural architecture search (NAS). The proposed architecture combines top-down and bottom-up connections to fuse features across scales, achieving better accuracy and latency tradeoff compared to state-of-the-art models. NAS-FPN works well with various backbone models, including MobileNet, ResNet, and AmoebaNet. When combined with MobileNetV2 in the RetinaNet framework, it outperforms SSDLite by 2 AP. With AmoebaNet-D, it achieves 48.3 AP, surpassing Mask R-CNN with less computation time. The architecture is designed to be scalable and can be stacked multiple times for better accuracy. The method uses a modular search space to discover efficient cross-scale connections, enabling anytime detection (early exit). The NAS-FPN architecture is trained using a reinforcement learning controller to select optimal architectures in a search space. The controller learns to generate better architectures over time, and the discovered architecture is used for object detection. The method is evaluated on the COCO dataset, showing significant improvements in detection accuracy and efficiency. The results demonstrate that NAS-FPN is a flexible and effective architecture for object detection.