Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference

Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference

29 Mar 2024 | Jovan Stojkovic, Esha Choukse†, Chaojie Zhang†, Inigo Goiri†, Josep Torrellas
This paper addresses the growing energy consumption challenges associated with the widespread use of large language models (LLMs) in various industries. The authors explore the trade-offs between performance and energy efficiency in LLM inference, focusing on the impact of different knobs available to LLM inference providers. They characterize the energy consumption under various settings, including workload type, batching, and model parallelism, and analyze their effects on latency, throughput, and energy usage. The study highlights the dynamic and heterogeneous nature of LLM inference workloads, which differ significantly from traditional latency-critical applications. The authors propose strategies for adaptive resource allocation and minimizing configuration overheads to enhance energy efficiency without compromising performance. The paper provides valuable insights into optimizing energy usage in LLM inference environments, paving the way for sustainable and cost-effective deployment of these models in data centers.This paper addresses the growing energy consumption challenges associated with the widespread use of large language models (LLMs) in various industries. The authors explore the trade-offs between performance and energy efficiency in LLM inference, focusing on the impact of different knobs available to LLM inference providers. They characterize the energy consumption under various settings, including workload type, batching, and model parallelism, and analyze their effects on latency, throughput, and energy usage. The study highlights the dynamic and heterogeneous nature of LLM inference workloads, which differ significantly from traditional latency-critical applications. The authors propose strategies for adaptive resource allocation and minimizing configuration overheads to enhance energy efficiency without compromising performance. The paper provides valuable insights into optimizing energy usage in LLM inference environments, paving the way for sustainable and cost-effective deployment of these models in data centers.
Reach us at info@study.space