16 Apr 2024 | Danfeng Qin, Chas Leichner, Manolis Delakis, Marco Fornoni, Shixin Luo, Fan Yang, Weijun Wang, Colby Banbury, Chengxi Ye, Berkin Akin, Vaibhav Aggarwal, Tenghui Zhu, Daniele Moro, and Andrew Howard
MobileNetV4 is a new generation of efficient mobile neural networks designed for universal performance across various mobile hardware platforms. The paper introduces the Universal Inverted Bottleneck (UIB) block, a flexible and efficient structure that integrates Inverted Bottleneck, ConvNext, Feed Forward Network, and a new Extra Depthwise variant. It also presents Mobile MQA, an attention block that provides a 39% speedup on mobile accelerators. An optimized neural architecture search (NAS) recipe is introduced to enhance the search effectiveness of MNv4. The combination of UIB, Mobile MQA, and the refined NAS recipe results in a new suite of MNv4 models that are mostly Pareto optimal across mobile CPUs, DSPs, GPUs, and specialized accelerators like Apple Neural Engine and Google Pixel EdgeTPU. Additionally, a novel distillation technique is introduced to further improve accuracy, with the MNv4-Hybrid-Large model achieving 87% ImageNet-1K accuracy with a Pixel 8 EdgeTPU runtime of just 3.8ms. The paper also discusses the design of MNv4 models, the universal inverted bottlenecks, Mobile MQA, and the results of MNv4 on ImageNet classification and COCO object detection. The study highlights the effectiveness of MNv4 in achieving high accuracy and efficiency across diverse mobile hardware platforms.MobileNetV4 is a new generation of efficient mobile neural networks designed for universal performance across various mobile hardware platforms. The paper introduces the Universal Inverted Bottleneck (UIB) block, a flexible and efficient structure that integrates Inverted Bottleneck, ConvNext, Feed Forward Network, and a new Extra Depthwise variant. It also presents Mobile MQA, an attention block that provides a 39% speedup on mobile accelerators. An optimized neural architecture search (NAS) recipe is introduced to enhance the search effectiveness of MNv4. The combination of UIB, Mobile MQA, and the refined NAS recipe results in a new suite of MNv4 models that are mostly Pareto optimal across mobile CPUs, DSPs, GPUs, and specialized accelerators like Apple Neural Engine and Google Pixel EdgeTPU. Additionally, a novel distillation technique is introduced to further improve accuracy, with the MNv4-Hybrid-Large model achieving 87% ImageNet-1K accuracy with a Pixel 8 EdgeTPU runtime of just 3.8ms. The paper also discusses the design of MNv4 models, the universal inverted bottlenecks, Mobile MQA, and the results of MNv4 on ImageNet classification and COCO object detection. The study highlights the effectiveness of MNv4 in achieving high accuracy and efficiency across diverse mobile hardware platforms.