Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge

Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge

April 06 12, 2017 | Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, Lingjia Tang
Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge The computation for intelligent personal assistants like Apple Siri, Google Now, and Microsoft Cortana is currently performed in the cloud. This cloud-only approach requires significant data transfer over wireless networks and places high computational pressure on datacenters. However, with increasing mobile device power and energy efficiency, questions arise about whether cloud-only processing is still optimal. This paper investigates computation partitioning strategies that leverage both cloud and mobile device resources to achieve low latency, low energy consumption, and high datacenter throughput for intelligent applications. Using 8 intelligent applications spanning computer vision, speech, and natural language domains, the study finds that a fine-grained, layer-level computation partitioning strategy based on DNN layer characteristics offers significant latency and energy advantages over the status quo approach. Neurosurgeon is a lightweight scheduler that automatically partitions DNN computation between mobile devices and datacenters at the granularity of neural network layers. It adapts to various DNN architectures, hardware platforms, wireless networks, and server load levels, intelligently partitioning computation for best latency or best mobile energy. Evaluation on a state-of-the-art mobile development platform shows that Neurosurgeon improves end-to-end latency by 3.1× on average and up to 40.7×, reduces mobile energy consumption by 59.5% on average and up to 94.7%, and improves datacenter throughput by 1.5× on average and up to 6.7×. The paper investigates the feasibility of executing large DNNs entirely on a state-of-the-art mobile device and compares with the status quo. It finds that for some applications, locally executing on the mobile device can be up to 11× faster than the cloud-only approach. A fine-grained layer-level partitioning strategy based on DNN topology and constituent layers can achieve far superior end-to-end latency performance and mobile energy efficiency. By pushing compute out of the cloud and onto the mobile devices, datacenter throughput is also improved. Neurosurgeon is a lightweight dynamic scheduler that automatically identifies the ideal partition points in DNNs and orchestrates the distribution of computation between the mobile device and the datacenter. It uses performance prediction models to predict the latency and power consumption of a DNN layer based on its type and configuration. Evaluation shows that Neurosurgeon significantly improves end-to-end latency, reduces mobile energy consumption, and improves datacenter throughput. Neurosurgeon is evaluated across Wi-Fi, LTE, and 3G wireless connections with both CPU-only and GPU mobile platforms. It achieves significant end-to-end latency and mobile energy improvements over the status quo cloud-only approach. It also outperforms MAUI, a well-known computation offloading framework, by up to 32× and 1.9× on average. Neurosurgeon is robust to variations in wireless network connections and server load,Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge The computation for intelligent personal assistants like Apple Siri, Google Now, and Microsoft Cortana is currently performed in the cloud. This cloud-only approach requires significant data transfer over wireless networks and places high computational pressure on datacenters. However, with increasing mobile device power and energy efficiency, questions arise about whether cloud-only processing is still optimal. This paper investigates computation partitioning strategies that leverage both cloud and mobile device resources to achieve low latency, low energy consumption, and high datacenter throughput for intelligent applications. Using 8 intelligent applications spanning computer vision, speech, and natural language domains, the study finds that a fine-grained, layer-level computation partitioning strategy based on DNN layer characteristics offers significant latency and energy advantages over the status quo approach. Neurosurgeon is a lightweight scheduler that automatically partitions DNN computation between mobile devices and datacenters at the granularity of neural network layers. It adapts to various DNN architectures, hardware platforms, wireless networks, and server load levels, intelligently partitioning computation for best latency or best mobile energy. Evaluation on a state-of-the-art mobile development platform shows that Neurosurgeon improves end-to-end latency by 3.1× on average and up to 40.7×, reduces mobile energy consumption by 59.5% on average and up to 94.7%, and improves datacenter throughput by 1.5× on average and up to 6.7×. The paper investigates the feasibility of executing large DNNs entirely on a state-of-the-art mobile device and compares with the status quo. It finds that for some applications, locally executing on the mobile device can be up to 11× faster than the cloud-only approach. A fine-grained layer-level partitioning strategy based on DNN topology and constituent layers can achieve far superior end-to-end latency performance and mobile energy efficiency. By pushing compute out of the cloud and onto the mobile devices, datacenter throughput is also improved. Neurosurgeon is a lightweight dynamic scheduler that automatically identifies the ideal partition points in DNNs and orchestrates the distribution of computation between the mobile device and the datacenter. It uses performance prediction models to predict the latency and power consumption of a DNN layer based on its type and configuration. Evaluation shows that Neurosurgeon significantly improves end-to-end latency, reduces mobile energy consumption, and improves datacenter throughput. Neurosurgeon is evaluated across Wi-Fi, LTE, and 3G wireless connections with both CPU-only and GPU mobile platforms. It achieves significant end-to-end latency and mobile energy improvements over the status quo cloud-only approach. It also outperforms MAUI, a well-known computation offloading framework, by up to 32× and 1.9× on average. Neurosurgeon is robust to variations in wireless network connections and server load,
Reach us at info@study.space
[slides] Neurosurgeon%3A Collaborative Intelligence Between the Cloud and Mobile Edge | StudySpace