Gonca Gürsun, Mark Crovella, Ibrahim Matta
Proceedings of INFOCOM 2011
Publication year: 2011

Abstract: Computer systems are increasingly driven by work- loads that reflect large-scale social behavior, such as rapid changes in the popularity of media items like videos. Capacity planners and system designers must plan for rapid, massive changes in workloads when such social behavior is a factor. In this paper we make two contributions intended to assist in the design and provisioning of such systems. We analyze an extensive dataset consisting of the daily access counts of hundreds of thousands of YouTube videos. In this dataset, we find that there are two types of videos: those that show rapid changes in popularity, and those that are consistently popular over long time periods. We call these two types rarely-accessed and frequently-accessed videos, respectively. We observe that most of the videos in our data set clearly fall in one of these two types. In this work, we study the frequently-accessed videos by asking two questions: first, is there a relatively simple model that can describe its daily access patterns? And second, can we use this simple model to predict the number of accesses that a video will have in the near future, as a tool for capacity planning? To answer these questions we develop a framework for characterization and forecasting of access patterns. We show that for frequently-accessed videos, daily access patterns can be extracted via principal component analysis, and used efficiently for forecasting.