TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks

TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks

20 May 2024 | Guanqiao Qu, Zheng Lin, Fangming Liu, Xianhao Chen, Kaibin Huang
TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks This paper proposes a novel model placement scheme called TrimCaching for edge model caching in wireless edge networks. The key observation is that many AI models, such as convolutional neural networks and large language models, share a significant proportion of parameter blocks containing reusable knowledge. This allows for more efficient storage and improved cache hit ratio. The paper formulates a parameter-sharing model placement problem to maximize the cache hit ratio in multiedge wireless networks by balancing the tradeoff between storage efficiency and service latency. The problem is shown to be a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To address this, the paper studies a special case where a small fixed number of parameter blocks are shared across models, leading to a polynomial-time algorithm with (1-ε)/2 approximation guarantee. For the general case, a greedy algorithm is developed. Simulation results show that TrimCaching significantly improves the cache hit ratio compared to state-of-the-art content caching without exploiting shared parameters in AI models. The paper introduces a parameter-sharing model caching framework for edge networks. It considers a typical wireless edge network scenario with multiple edge servers and users. The framework aims to place AI models on edge servers to serve as many user requests as possible by exploiting shared parameter blocks among models. The cache hit ratio is defined as the probability of successfully downloading AI models from edge servers within latency constraints. The paper formulates the cache hit ratio maximization problem and shows that it is a submodular maximization problem with submodular constraints. The problem is mapped to a known NP-hard problem, and it is shown that no polynomial-time algorithm with a constant approximation guarantee exists. For the special case with a small fixed number of shared parameter blocks, a polynomial-time algorithm with (1-ε)/2 approximation guarantee is developed. For the general case, a greedy algorithm is proposed. The paper also presents simulation results demonstrating the effectiveness of TrimCaching in improving the cache hit ratio.TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks This paper proposes a novel model placement scheme called TrimCaching for edge model caching in wireless edge networks. The key observation is that many AI models, such as convolutional neural networks and large language models, share a significant proportion of parameter blocks containing reusable knowledge. This allows for more efficient storage and improved cache hit ratio. The paper formulates a parameter-sharing model placement problem to maximize the cache hit ratio in multiedge wireless networks by balancing the tradeoff between storage efficiency and service latency. The problem is shown to be a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To address this, the paper studies a special case where a small fixed number of parameter blocks are shared across models, leading to a polynomial-time algorithm with (1-ε)/2 approximation guarantee. For the general case, a greedy algorithm is developed. Simulation results show that TrimCaching significantly improves the cache hit ratio compared to state-of-the-art content caching without exploiting shared parameters in AI models. The paper introduces a parameter-sharing model caching framework for edge networks. It considers a typical wireless edge network scenario with multiple edge servers and users. The framework aims to place AI models on edge servers to serve as many user requests as possible by exploiting shared parameter blocks among models. The cache hit ratio is defined as the probability of successfully downloading AI models from edge servers within latency constraints. The paper formulates the cache hit ratio maximization problem and shows that it is a submodular maximization problem with submodular constraints. The problem is mapped to a known NP-hard problem, and it is shown that no polynomial-time algorithm with a constant approximation guarantee exists. For the special case with a small fixed number of shared parameter blocks, a polynomial-time algorithm with (1-ε)/2 approximation guarantee is developed. For the general case, a greedy algorithm is proposed. The paper also presents simulation results demonstrating the effectiveness of TrimCaching in improving the cache hit ratio.
Reach us at info@study.space
[slides] TrimCaching%3A Parameter-Sharing AI Model Caching in Wireless Edge Networks | StudySpace