FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search

24 May 2019 | Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer
FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search Designing accurate and efficient ConvNets for mobile devices is challenging due to the combinatorially large design space. Previous neural architecture search (NAS) methods are computationally expensive. ConvNet architecture optimality depends on factors like input resolution and target devices. Existing approaches are too resource-demanding for case-by-case redesigns. Previous work focuses on reducing FLOPs, but FLOP count does not always reflect actual latency. To address these, we propose a differentiable neural architecture search (DNAS) framework that uses gradient-based methods to optimize ConvNet architectures, avoiding enumerating and training individual architectures separately as in previous methods. FBNets (Facebook-Berkeley-Nets), a family of models discovered by DNAS, surpass state-of-the-art models both designed manually and generated automatically. FBNet-B achieves 74.1% top-1 accuracy on ImageNet with 295M FLOPs and 23.1 ms latency on a Samsung S8 phone, 2.4x smaller and 1.5x faster than MobileNetV2-1.3. Despite higher accuracy and lower latency than MnasNet, FBNet-B's search cost is 420x smaller than MnasNet's, at only 216 GPU hours. Searched for different resolutions and channel sizes, FBNets achieve 1.5% to 6.4% higher accuracy than MobileNetV2. The smallest FBNet achieves 50.2% accuracy and 2.9 ms latency (345 frames per second) on a Samsung S8. Over a Samsung-optimized FBNet, the iPhone-X-optimized model achieves a 1.4x speedup on an iPhone X. FBNet models are open-sourced at https://github.com/facebookresearch/mobile-vision. DNAS explores a layer-wise space where each layer can choose a different block. The search space is represented by a stochastic super net. The search process trains the stochastic super net using SGD to optimize the architecture distribution. Optimal architectures are sampled from the trained distribution. The latency of each operator is measured on target devices and used to compute the loss for the super net. The search space is defined by a macro-architecture with 22 layers and 9 candidate blocks. The loss function considers both accuracy and latency. The latency term is estimated using a lookup table model, making it differentiable with respect to layer-wise block choices. This allows gradient-based optimization to solve the problem. Experiments show that FBNets outperform state-of-the-art models in accuracy and efficiency. FBNet-B achieves 74.1% top-1 accuracy with 295M FLOPs and 23.1 ms latency on a Samsung S8, 2.4x smaller and 1.5FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search Designing accurate and efficient ConvNets for mobile devices is challenging due to the combinatorially large design space. Previous neural architecture search (NAS) methods are computationally expensive. ConvNet architecture optimality depends on factors like input resolution and target devices. Existing approaches are too resource-demanding for case-by-case redesigns. Previous work focuses on reducing FLOPs, but FLOP count does not always reflect actual latency. To address these, we propose a differentiable neural architecture search (DNAS) framework that uses gradient-based methods to optimize ConvNet architectures, avoiding enumerating and training individual architectures separately as in previous methods. FBNets (Facebook-Berkeley-Nets), a family of models discovered by DNAS, surpass state-of-the-art models both designed manually and generated automatically. FBNet-B achieves 74.1% top-1 accuracy on ImageNet with 295M FLOPs and 23.1 ms latency on a Samsung S8 phone, 2.4x smaller and 1.5x faster than MobileNetV2-1.3. Despite higher accuracy and lower latency than MnasNet, FBNet-B's search cost is 420x smaller than MnasNet's, at only 216 GPU hours. Searched for different resolutions and channel sizes, FBNets achieve 1.5% to 6.4% higher accuracy than MobileNetV2. The smallest FBNet achieves 50.2% accuracy and 2.9 ms latency (345 frames per second) on a Samsung S8. Over a Samsung-optimized FBNet, the iPhone-X-optimized model achieves a 1.4x speedup on an iPhone X. FBNet models are open-sourced at https://github.com/facebookresearch/mobile-vision. DNAS explores a layer-wise space where each layer can choose a different block. The search space is represented by a stochastic super net. The search process trains the stochastic super net using SGD to optimize the architecture distribution. Optimal architectures are sampled from the trained distribution. The latency of each operator is measured on target devices and used to compute the loss for the super net. The search space is defined by a macro-architecture with 22 layers and 9 candidate blocks. The loss function considers both accuracy and latency. The latency term is estimated using a lookup table model, making it differentiable with respect to layer-wise block choices. This allows gradient-based optimization to solve the problem. Experiments show that FBNets outperform state-of-the-art models in accuracy and efficiency. FBNet-B achieves 74.1% top-1 accuracy with 295M FLOPs and 23.1 ms latency on a Samsung S8, 2.4x smaller and 1.5
Reach us at info@study.space
Understanding FBNet%3A Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search