[slides] Neurosurgeon%3A Collaborative Intelligence Between the Cloud and Mobile Edge

The paper "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" by Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang from the University of Michigan explores the integration of cloud and mobile edge computing for intelligent personal assistants (IPAs) such as Apple Siri, Google Now, and Microsoft Cortana. The authors examine the current approach of cloud-only processing and investigate computation partitioning strategies that leverage both cloud and mobile devices to achieve low latency, low energy consumption, and high datacenter throughput. Key findings include: - The data transfer overhead is often the bottleneck in cloud-only processing, leading to high latency and energy costs. - Mobile devices can execute DNN-based applications locally, reducing latency and energy consumption. - A fine-grained, layer-level computation partitioning strategy based on the data and computation variations of each layer within a DNN can significantly improve performance. - Neurosurgeon, a lightweight scheduler, automatically partitions DNN computation between mobile devices and datacenters at the granularity of neural network layers, adapting to various DNN architectures, hardware platforms, wireless networks, and server load levels. The evaluation on 8 DNN-based intelligent applications shows that Neurosurgeon improves end-to-end latency by 3.1× on average and up to 40.7×, reduces mobile energy consumption by 59.5% on average and up to 94.7%, and improves datacenter throughput by 1.5× on average and up to 6.7×. The paper also compares Neurosurgeon with other offloading frameworks, demonstrating its effectiveness in handling network variations and server load changes.The paper "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" by Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang from the University of Michigan explores the integration of cloud and mobile edge computing for intelligent personal assistants (IPAs) such as Apple Siri, Google Now, and Microsoft Cortana. The authors examine the current approach of cloud-only processing and investigate computation partitioning strategies that leverage both cloud and mobile devices to achieve low latency, low energy consumption, and high datacenter throughput. Key findings include: - The data transfer overhead is often the bottleneck in cloud-only processing, leading to high latency and energy costs. - Mobile devices can execute DNN-based applications locally, reducing latency and energy consumption. - A fine-grained, layer-level computation partitioning strategy based on the data and computation variations of each layer within a DNN can significantly improve performance. - Neurosurgeon, a lightweight scheduler, automatically partitions DNN computation between mobile devices and datacenters at the granularity of neural network layers, adapting to various DNN architectures, hardware platforms, wireless networks, and server load levels. The evaluation on 8 DNN-based intelligent applications shows that Neurosurgeon improves end-to-end latency by 3.1× on average and up to 40.7×, reduces mobile energy consumption by 59.5% on average and up to 94.7%, and improves datacenter throughput by 1.5× on average and up to 6.7×. The paper also compares Neurosurgeon with other offloading frameworks, demonstrating its effectiveness in handling network variations and server load changes.

Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge

2017 | Yiping Kang Johann Hauswald Cao Gao Austin Rovinski Trevor Mudge Jason Mars Lingjia Tang