[slides and audio] TrimCaching%3A Parameter-Sharing AI Model Caching in Wireless Edge Networks

The paper introduces TrimCaching, a novel model placement scheme for edge model caching in wireless networks. TrimCaching leverages the observation that AI models, such as convolutional neural networks (CNNs) and large language models (LLMs), can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. The authors formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-device wireless networks, balancing storage efficiency and service latency. They show that this problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To address this challenge, they study a special case where a small fixed number of parameter blocks are shared across models, which is common in practice. In this case, they develop a successive greedy and dynamic programming (DP)-based algorithm with a $(1 - \epsilon)/2$-approximation guarantee. For the general case, they propose a greedy algorithm. Simulation results demonstrate that TrimC-Caching significantly improves the cache hit ratio compared to state-of-the-art content caching without exploiting shared parameters in AI models.The paper introduces TrimCaching, a novel model placement scheme for edge model caching in wireless networks. TrimCaching leverages the observation that AI models, such as convolutional neural networks (CNNs) and large language models (LLMs), can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. The authors formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-device wireless networks, balancing storage efficiency and service latency. They show that this problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To address this challenge, they study a special case where a small fixed number of parameter blocks are shared across models, which is common in practice. In this case, they develop a successive greedy and dynamic programming (DP)-based algorithm with a $(1 - \epsilon)/2$-approximation guarantee. For the general case, they propose a greedy algorithm. Simulation results demonstrate that TrimC-Caching significantly improves the cache hit ratio compared to state-of-the-art content caching without exploiting shared parameters in AI models.

TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks

20 May 2024 | Guanqiao Qu, Zheng Lin, Fangming Liu, Xianhao Chen, Kaibin Huang