On the Placement of Web Server Replicas

On the Placement of Web Server Replicas

| Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker
This paper explores the problem of Web server replica placement in content distribution networks (CDNs). We develop several placement algorithms that use workload information, such as client latency and request rates, to make informed placement decisions. We evaluate the placement algorithms using both synthetic and real network topologies, as well as Web server traces, and show that the placement of Web replicas is crucial to CDN performance. We also address a number of practical issues when using these algorithms, such as their sensitivity to imperfect knowledge about client workload and network topology, the stability of the input data, and methods for obtaining the input. We propose several algorithms that can automate the server placement decision. The objective is to choose M replicas (or hosting services) among N potential sites (N > M) such that some objective function is optimized under a given traffic pattern. The objective function can be minimizing either its clients' latency, or its total bandwidth consumption, or an overall cost function if each link is associated with a cost. We evaluate the performance of the various placement algorithms by simulating their behavior on synthetic and real network topologies and several access traces from large commercial and government Web servers. We also address a number of practical issues when using these algorithms online in a content distribution network, and study the sensitivity of the placement algorithms to imperfect information about client workload characteristics. Based upon our results, we conclude that a greedy algorithm for solving the Web server replica placement problem can provide content distribution networks with performance that is close to optimal. Although the greedy algorithm depends upon estimates of client distance and load predictions, we find that it is relatively insensitive to errors in these estimates and therefore is a viable algorithm for use in the general Internet environment where workload information will always be imperfect. Our main results and conclusions are: - Placement algorithms should incorporate client workload information, such as client distance and request rate, in their placement decisions. Such algorithms consistently perform a factor of 2 – 5 better than a workload-oblivious random algorithm. - A greedy algorithm that places replicas based upon both a distance metric and request load performs the best (i.e., its median performance is within a factor of 1.1 – 1.5 of optimal). A hot spot algorithm based upon request load only performs nearly as well (its median performance is within a factor of 1.6 – 2 of optimal). A tree-based algorithm developed for proxy cache hierarchies performs better than random placement, but not as well as the algorithms for general graph topologies. - The placement algorithms are not very sensitive to noise in the estimates of distance and load used as inputs to the algorithms. Even with rough estimates of client distance and request load salted with random noise, the algorithms perform nearly as well as when they used perfect knowledge. For example, when the salted error is as high as a factor of 4, the greedy algorithm stays within a factor of 2 of the super-optimal in most cases. - When deployed, the placement algorithmsThis paper explores the problem of Web server replica placement in content distribution networks (CDNs). We develop several placement algorithms that use workload information, such as client latency and request rates, to make informed placement decisions. We evaluate the placement algorithms using both synthetic and real network topologies, as well as Web server traces, and show that the placement of Web replicas is crucial to CDN performance. We also address a number of practical issues when using these algorithms, such as their sensitivity to imperfect knowledge about client workload and network topology, the stability of the input data, and methods for obtaining the input. We propose several algorithms that can automate the server placement decision. The objective is to choose M replicas (or hosting services) among N potential sites (N > M) such that some objective function is optimized under a given traffic pattern. The objective function can be minimizing either its clients' latency, or its total bandwidth consumption, or an overall cost function if each link is associated with a cost. We evaluate the performance of the various placement algorithms by simulating their behavior on synthetic and real network topologies and several access traces from large commercial and government Web servers. We also address a number of practical issues when using these algorithms online in a content distribution network, and study the sensitivity of the placement algorithms to imperfect information about client workload characteristics. Based upon our results, we conclude that a greedy algorithm for solving the Web server replica placement problem can provide content distribution networks with performance that is close to optimal. Although the greedy algorithm depends upon estimates of client distance and load predictions, we find that it is relatively insensitive to errors in these estimates and therefore is a viable algorithm for use in the general Internet environment where workload information will always be imperfect. Our main results and conclusions are: - Placement algorithms should incorporate client workload information, such as client distance and request rate, in their placement decisions. Such algorithms consistently perform a factor of 2 – 5 better than a workload-oblivious random algorithm. - A greedy algorithm that places replicas based upon both a distance metric and request load performs the best (i.e., its median performance is within a factor of 1.1 – 1.5 of optimal). A hot spot algorithm based upon request load only performs nearly as well (its median performance is within a factor of 1.6 – 2 of optimal). A tree-based algorithm developed for proxy cache hierarchies performs better than random placement, but not as well as the algorithms for general graph topologies. - The placement algorithms are not very sensitive to noise in the estimates of distance and load used as inputs to the algorithms. Even with rough estimates of client distance and request load salted with random noise, the algorithms perform nearly as well as when they used perfect knowledge. For example, when the salted error is as high as a factor of 4, the greedy algorithm stays within a factor of 2 of the super-optimal in most cases. - When deployed, the placement algorithms
Reach us at info@study.space
Understanding On the placement of Web server replicas