February 28-29, 2024 | Qifei Dong, Xiangliang Chen, Mahadev Satyanarayanan
This paper presents a method to leverage cloud-based large language models (LLMs) for creating latency-critical edge AI systems. The key idea is to use LLMs as offline compilers to generate task-specific code that avoids direct LLM access, thereby reducing latency. The authors provide three case studies to demonstrate this approach: wearable cognitive assistance, mission-centric drone flight, and real-time style transfer.
LLMs, such as GPT, are cloud-centric due to their high resource demands and the need to protect their intellectual property. However, they have significantly higher end-to-end latencies compared to cyber-human (CH) and cyber-physical (CP) systems, which require latencies on the order of tens of milliseconds. To bridge this gap, the authors propose using LLMs as compilers to generate code that can be executed on edge devices, avoiding the need for real-time LLM access.
The paper discusses the challenges of using LLMs in edge computing, including the need for task-specific knowledge, the difficulty of creating effective prompts, and the importance of ensuring the correctness of generated code. The authors also highlight the potential of LLMs as powerful offline tools for creating latency-critical systems, while avoiding the IP issues associated with deploying LLMs outside the cloud.
The case studies demonstrate how LLMs can be used to simplify the development of edge AI applications. For example, in wearable cognitive assistance, LLMs can be used to generate finite state machines for task execution. In drone flight, LLMs can be used to generate mission scripts. In real-time style transfer, LLMs can be used to discover artworks that match a user's description.
The paper concludes that LLMs can be effectively used as offline compilers for edge AI systems, and that future research should focus on improving the accuracy and efficiency of LLM-based code generation. The authors also highlight the need for further research into the correctness of generated code and the potential for LLMs to be used in a wide range of edge computing applications.This paper presents a method to leverage cloud-based large language models (LLMs) for creating latency-critical edge AI systems. The key idea is to use LLMs as offline compilers to generate task-specific code that avoids direct LLM access, thereby reducing latency. The authors provide three case studies to demonstrate this approach: wearable cognitive assistance, mission-centric drone flight, and real-time style transfer.
LLMs, such as GPT, are cloud-centric due to their high resource demands and the need to protect their intellectual property. However, they have significantly higher end-to-end latencies compared to cyber-human (CH) and cyber-physical (CP) systems, which require latencies on the order of tens of milliseconds. To bridge this gap, the authors propose using LLMs as compilers to generate code that can be executed on edge devices, avoiding the need for real-time LLM access.
The paper discusses the challenges of using LLMs in edge computing, including the need for task-specific knowledge, the difficulty of creating effective prompts, and the importance of ensuring the correctness of generated code. The authors also highlight the potential of LLMs as powerful offline tools for creating latency-critical systems, while avoiding the IP issues associated with deploying LLMs outside the cloud.
The case studies demonstrate how LLMs can be used to simplify the development of edge AI applications. For example, in wearable cognitive assistance, LLMs can be used to generate finite state machines for task execution. In drone flight, LLMs can be used to generate mission scripts. In real-time style transfer, LLMs can be used to discover artworks that match a user's description.
The paper concludes that LLMs can be effectively used as offline compilers for edge AI systems, and that future research should focus on improving the accuracy and efficiency of LLM-based code generation. The authors also highlight the need for further research into the correctness of generated code and the potential for LLMs to be used in a wide range of edge computing applications.