[slides and audio] MobileNetV4 - Universal Models for the Mobile Ecosystem

The paper introduces MobileNetV4 (MNv4), a new generation of efficient neural network models designed for mobile devices. MNv4 features the Universal Inverted Bottleneck (UIB) block, which unifies and extends the Inverted Bottleneck (IB), ConvNext, and Feed Forward Network (FFN) structures, along with a novel Extra Depthwise (ExtraDW) variant. The paper also presents Mobile MQA, an attention block optimized for mobile accelerators, which achieves a 39% speedup over Multi-Head Attention. An optimized neural architecture search (NAS) recipe is introduced to improve the effectiveness of MNv4 models. The integration of UIB, Mobile MQA, and the refined NAS recipe results in a suite of MNv4 models that are mostly Pareto optimal across various mobile hardware platforms, including CPUs, DSPs, GPUs, and specialized accelerators like the Apple Neural Engine and Google Pixel EdgeTPU. Additionally, a novel distillation technique is introduced to further boost accuracy, with the MNv4-Hybrid-L model achieving 87% ImageNet-1K accuracy and a Pixel 8 EdgeTPU runtime of just 3.8ms. The paper also includes a theoretical framework and analysis to understand the universality of models on heterogeneous devices, highlighting the strengths of MNv4 in achieving efficient performance across different hardware platforms.The paper introduces MobileNetV4 (MNv4), a new generation of efficient neural network models designed for mobile devices. MNv4 features the Universal Inverted Bottleneck (UIB) block, which unifies and extends the Inverted Bottleneck (IB), ConvNext, and Feed Forward Network (FFN) structures, along with a novel Extra Depthwise (ExtraDW) variant. The paper also presents Mobile MQA, an attention block optimized for mobile accelerators, which achieves a 39% speedup over Multi-Head Attention. An optimized neural architecture search (NAS) recipe is introduced to improve the effectiveness of MNv4 models. The integration of UIB, Mobile MQA, and the refined NAS recipe results in a suite of MNv4 models that are mostly Pareto optimal across various mobile hardware platforms, including CPUs, DSPs, GPUs, and specialized accelerators like the Apple Neural Engine and Google Pixel EdgeTPU. Additionally, a novel distillation technique is introduced to further boost accuracy, with the MNv4-Hybrid-L model achieving 87% ImageNet-1K accuracy and a Pixel 8 EdgeTPU runtime of just 3.8ms. The paper also includes a theoretical framework and analysis to understand the universality of models on heterogeneous devices, highlighting the strengths of MNv4 in achieving efficient performance across different hardware platforms.

MobileNetV4 - Universal Models for the Mobile Ecosystem

16 Apr 2024 | Danfeng Qin†‡, Chas Leichner †‡, Manolis Delakis †, Marco Fornoni, Shixin Luo, Fan Yang, Weijun Wang, Colby Banbury, Chengxi Ye, Berkin Akin, Vaibhav Aggarwal, Tenghui Zhu, Daniele Moro, and Andrew Howard†§