Predicting the popularity of online content

Predicting the popularity of online content

4 Nov 2008 | Gabor Szabo, Bernardo A. Huberman
This paper presents a method for accurately predicting the long-term popularity of online content based on early access measurements. Using data from two content-sharing platforms, YouTube and Digg, the authors demonstrate that by modeling the accumulation of views and votes, they can predict the long-term dynamics of individual submissions from initial data. For Digg, measuring access to stories within the first two hours allows for accurate 30-day predictions, while YouTube videos require 10 days of data to achieve similar accuracy. The difference in prediction time scales is attributed to how content is consumed on each platform: Digg stories quickly become outdated, while YouTube videos remain popular long after their initial submission. Predictions are more accurate for content with rapid attention decay, while evergreen content is prone to larger errors. The study shows that the popularity of submissions can be predicted using logarithmic transformations, which reveal strong correlations between early and later times. This allows for modeling and predicting future popularity. Three prediction models are presented: the LN model, which uses linear regression on a logarithmic scale; the CS model, which uses constant scaling; and the GP model, which uses growth profiles. The CS model outperforms the others for relative error measures, while the LN model is better for absolute error measures. The paper also discusses the saturation of popularity over time, showing that Digg stories tend to saturate quickly, while YouTube videos continue to gain views. This difference is due to the nature of content consumption on each platform. The study concludes that relative error measures are more suitable for community portals, as they provide more accurate predictions when the error of the prediction is estimated. The authors also note that future research could explore different sections of Web 2.0 portals and the impact of user behavior on content popularity.This paper presents a method for accurately predicting the long-term popularity of online content based on early access measurements. Using data from two content-sharing platforms, YouTube and Digg, the authors demonstrate that by modeling the accumulation of views and votes, they can predict the long-term dynamics of individual submissions from initial data. For Digg, measuring access to stories within the first two hours allows for accurate 30-day predictions, while YouTube videos require 10 days of data to achieve similar accuracy. The difference in prediction time scales is attributed to how content is consumed on each platform: Digg stories quickly become outdated, while YouTube videos remain popular long after their initial submission. Predictions are more accurate for content with rapid attention decay, while evergreen content is prone to larger errors. The study shows that the popularity of submissions can be predicted using logarithmic transformations, which reveal strong correlations between early and later times. This allows for modeling and predicting future popularity. Three prediction models are presented: the LN model, which uses linear regression on a logarithmic scale; the CS model, which uses constant scaling; and the GP model, which uses growth profiles. The CS model outperforms the others for relative error measures, while the LN model is better for absolute error measures. The paper also discusses the saturation of popularity over time, showing that Digg stories tend to saturate quickly, while YouTube videos continue to gain views. This difference is due to the nature of content consumption on each platform. The study concludes that relative error measures are more suitable for community portals, as they provide more accurate predictions when the error of the prediction is estimated. The authors also note that future research could explore different sections of Web 2.0 portals and the impact of user behavior on content popularity.
Reach us at info@study.space