Understanding Predicting the popularity of online content

The paper presents a method for predicting the long-term popularity of online content from early access data. Using YouTube and Digg as case studies, the authors demonstrate that by modeling the accrual of views and votes, it is possible to forecast the popularity of individual submissions with remarkable accuracy. For Digg, measuring access to stories during the first two hours allows for 30-day forecasts with high accuracy, while for YouTube videos, 10 days of data are needed. The differing time scales are attributed to the different consumption patterns of content on the two platforms: Digg stories quickly become outdated, while YouTube videos remain popular for longer. The paper also shows that predictions are more accurate for submissions that decay quickly in attention, while predictions for evergreen content are prone to larger errors. The authors propose three prediction models and compare their performance, finding that the linear regression model (LN) minimizes absolute squared errors, the constant scaling model (CS) minimizes relative squared errors, and the growth profile model (GP) is useful for comparison. The study concludes with a discussion on the saturation of popularity and the implications for advertising and content ranking.The paper presents a method for predicting the long-term popularity of online content from early access data. Using YouTube and Digg as case studies, the authors demonstrate that by modeling the accrual of views and votes, it is possible to forecast the popularity of individual submissions with remarkable accuracy. For Digg, measuring access to stories during the first two hours allows for 30-day forecasts with high accuracy, while for YouTube videos, 10 days of data are needed. The differing time scales are attributed to the different consumption patterns of content on the two platforms: Digg stories quickly become outdated, while YouTube videos remain popular for longer. The paper also shows that predictions are more accurate for submissions that decay quickly in attention, while predictions for evergreen content are prone to larger errors. The authors propose three prediction models and compare their performance, finding that the linear regression model (LN) minimizes absolute squared errors, the constant scaling model (CS) minimizes relative squared errors, and the growth profile model (GP) is useful for comparison. The study concludes with a discussion on the saturation of popularity and the implications for advertising and content ranking.

Predicting the popularity of online content

4 Nov 2008 | Gabor Szabo, Bernardo A. Huberman