Another option is to adjust the daily data based on the weekly time series. This creates a graph that looks more like the weekly data. This is done by dividing the daily search volume by the previous available weekly search volume.
The scatterplot between the two variables shows that they are correlated, but that there are significant differences.
R functions
R code to execute functions
Daily data
Log(daily data)
Percentage change
Log(percentage change)
Weekly + daily data
Scatterplot of log(variables)
Intercept:-0.17***
Log(weekly adjusted):0.18***
Scatterplot without outliers (no logarithm applied)
Intercept: 0.93***
Weekly adjusted: 0.04***
*** Significant at 0.1% level
Conclusion
It seems that starting from the weekly time series creates a time series that is better behaved, with a distribution closer to the normal distribution. Since we cannot say what the true distribution of the underlying sample is, we must choose the method based on what phenomena we think the search data represent. It could for instance be argued that search data represents Internet users attention towards a particular search term. The method for calculating the daily search volume should then be chosen based on which method corresponds most closely to what we believe is the actual attention of Internet users.
It could for instance be relevant to identify a particular topic where we have good data on attention already, and then measure how it corresponds to the search data.
No comments:
Post a Comment