Tuesday, November 25, 2014

Two methods for combining daily Google Trends data into longer time series

There are two methods for combinging daily Google Trends data into time series longer than the 90 days provided. We can either start from the weekly data provided for 2004-present and add the daily data in between the gaps, or we can combine the daily time series based on their percentage change. The results will be markedly different. The percentage change is complicated by Google Trends going to zero at times. To correct for this, we must assign some value to the zeros, which skews the data- In order to adjust for this the natural logarithm is applied.

Another option is to adjust the daily data based on the weekly time series. This creates a graph that looks more like the weekly data. This is done by dividing the daily search volume by the previous available weekly search volume.

The scatterplot between the two variables shows that they are correlated, but that there are significant differences.

R functions
R code to execute functions

Daily data

 Log(daily data)

 Percentage change



Log(percentage change)



 Weekly + daily data


Scatterplot of log(variables)


Intercept:-0.17***
Log(weekly adjusted):0.18***

Scatterplot without outliers (no logarithm applied)

Intercept: 0.93***
Weekly adjusted: 0.04***

*** Significant at 0.1% level

Conclusion

It seems that starting from the weekly time series creates a time series that is better behaved, with a distribution closer to the normal distribution. Since we cannot say what the true distribution of the underlying sample is, we must choose the method based on what phenomena we think the search data represent. It could for instance be argued that search data represents Internet users attention towards a particular search term. The method for calculating the daily search volume should then be chosen based on which method corresponds most closely to what we believe is the actual attention of Internet users.

It could for instance be relevant to identify a particular topic where we have good data on attention already, and then measure how it corresponds to the search data.
Entertaining Blogs - BlogCatalog Blog Directory
Bloggtoppen.se