Abstract: The goal of Content Delivery Networks (CDNs) is to serve content to end-users with high performance. In order to do that, a CDN measures the latency on the paths from its servers to users and then selects a best available server for each user. For large CDNs, monitoring paths from thousands of servers to millions of users is a challenging task due to its size. In this paper, we address this problem and propose a framework to scale the task of path monitoring. Simply stated, the goal of our framework is clustering IP addresses (clients) such that in each cluster the choice of best available server is same (or similar). Then, finding a best available server for one client in a given cluster will be sufficient to assign that server to the rest of the clients in the cluster.
To achieve this goal, first we introduce two distance metrics to compute how similar the server choices of any given two clients. Second, we use a clustering method that is based on interdomain routing information. We evaluate the goodness of our clusters by using the metrics we introduce. We show that there is a strong correlation between the similarity in how two destination clients are routed to in the Internet and the similarity in their server selections. Finally, we show how to choose representative clients from each cluster so that it is sufficient to learn the latencies from the CDN servers to the representative and find a best available server accordingly for the rest of the clients in the same cluster.