[slides] Creating Edge AI from Cloud-based LLMs

The paper "Creating Edge AI from Cloud-based LLMs" by Qifei Dong, Xiangliang Chen, and Mahadev Satyanarayanan from Carnegie Mellon University explores the challenge of bridging the latency gap between cloud-based large language models (LLMs) and latency-constrained cyber-human (CH) and cyber-physical (CP) systems. LLMs, with their cloud-centric nature, have end-to-end latencies that are orders of magnitude larger than the tight latency bounds required by CH and CP systems. The authors propose using LLMs as offline compilers to generate task-specific, latency-critical code, avoiding direct LLM accesses in critical paths. The paper presents three case studies to demonstrate the feasibility of this approach: 1. **Wearable Cognitive Assistance (WCA)**: LLMs are used to simplify the creation of finite state machines (FSMs) for task workflow specification, reducing the manual development effort required for WCA applications. 2. **Mission-Centric Drone Flight**: LLMs help in describing flight routes using natural language, generating KML files for drone missions, which are then used in real-time control software. 3. **Real-Time Style Transfer**: LLMs assist in discovering and selecting artwork based on verbal descriptions, which are then used to train neural style transfer algorithms for augmented reality applications. The authors also discuss the challenges and opportunities in generalizing this technique, including the need for incremental guidance, acquired proficiency, user-specific customization, and ensuring the correctness of generated code. They conclude that while the current approach is achievable, future improvements in LLMs and more sophisticated techniques will enhance its value in latency-critical applications.The paper "Creating Edge AI from Cloud-based LLMs" by Qifei Dong, Xiangliang Chen, and Mahadev Satyanarayanan from Carnegie Mellon University explores the challenge of bridging the latency gap between cloud-based large language models (LLMs) and latency-constrained cyber-human (CH) and cyber-physical (CP) systems. LLMs, with their cloud-centric nature, have end-to-end latencies that are orders of magnitude larger than the tight latency bounds required by CH and CP systems. The authors propose using LLMs as offline compilers to generate task-specific, latency-critical code, avoiding direct LLM accesses in critical paths. The paper presents three case studies to demonstrate the feasibility of this approach: 1. **Wearable Cognitive Assistance (WCA)**: LLMs are used to simplify the creation of finite state machines (FSMs) for task workflow specification, reducing the manual development effort required for WCA applications. 2. **Mission-Centric Drone Flight**: LLMs help in describing flight routes using natural language, generating KML files for drone missions, which are then used in real-time control software. 3. **Real-Time Style Transfer**: LLMs assist in discovering and selecting artwork based on verbal descriptions, which are then used to train neural style transfer algorithms for augmented reality applications. The authors also discuss the challenges and opportunities in generalizing this technique, including the need for incremental guidance, acquired proficiency, user-specific customization, and ensuring the correctness of generated code. They conclude that while the current approach is achievable, future improvements in LLMs and more sophisticated techniques will enhance its value in latency-critical applications.

Creating Edge AI from Cloud-based LLMs

February 28-29, 2024, San Diego, CA, USA | Qifei Dong, Xiangliang Chen, Mahadev Satyanarayanan