Friday, July 18, 2014

Google Trends and stock indexes


In order to use Google Trends data in financial analysis and forecasting, it is necessary to understand how to manipulate the data in the right way. Here, I demonstrate that there is a big difference between the average search pattern of index constituents and for the index itself. I.e. averageing search data for companies in FTSE 100 index gives a very different pattern than the querry for "FTSE 100".
Individual companies can have a high degree of correlation with the search data for the index, as is shown in the table at the bottom. On the whole there does not seem to be a direct relationship between the two.

I have collected a data set consisting of the daily searches for stocks on FTSE 100 since 2004.This is what daily search activity for the querry "FTSE 100" looks like:

Out of the 100 companies on FTSE 100, there is Google Trends data for 36 of them with somewhat complete data for the entire time period. First, lets take a look at what the average search activity has looked like:


From the graph it is clear that the simple average SVI for the index constituents is quite different from querries for the index itself. An OLS regression between the two confirms the point:
But how much of the variance in the index search querry can be explained with the searches for the constituents? Using a multiple regression where the search querry for "FTSE 100" is the dependent variable, and the 36 companies I have data for are the independent variables, I get the following result:


The upper graph is the result when the constituent data has been weighted by their coefficient from the regression analysis. The lower graph is the normal data. The adjusted R^2 for the regression is 48.8%, which is quite low if we assume that the search querries for the index is determined by its constituents. Based on these graphs we can conclude that the combined search querries for index constituents have a very different pattern from the index itself.

In the following post I will compare how the averaged data compares to the index querry when it comes to explaining volatility and volume on an index level.



Estimate Std. Error t value Pr
(Intercept) -45.9945 10.70911 -4.295 1.84E-05 ***
antofagasta -0.08598 0.10204 -0.843 0.399576
barclays -0.79759 0.13393 -5.955 3.11E-09 ***
bg.group 0.07388 0.04555 1.622 0.104973
bhp.billiton 0.25362 0.04331 5.856 5.63E-09 ***
bp 0.27475 0.07597 3.617 0.000307 ***
british.american.tobacco -0.02044 0.06471 -0.316 0.75208
bt.group 0.07351 0.0421 1.746 0.080954 .
bunzl -0.0606 0.04473 -1.355 0.175677
capita 0.19081 0.0597 3.196 0.001418 **
carnival -0.33172 0.08659 -3.831 0.000132 ***
centrica 0.00176 0.03652 0.048 0.961554
coca.cola 0.12069 0.05971 2.021 0.043406 *
compass.group -0.05013 0.04275 -1.173 0.241132
crh -0.05094 0.02877 -1.771 0.076751 .
diageo -0.07566 0.0308 -2.456 0.014131 *
easyjet -0.35516 0.09915 -3.582 0.00035 ***
experian 0.84559 0.12022 7.034 2.85E-12 ***
fresnillo -0.09778 0.05871 -1.666 0.095972 .
g4s 0.09061 0.06033 1.502 0.133294
gkn 0.04686 0.04956 0.946 0.344499
glaxosmithkline 0.09202 0.06327 1.454 0.146051
hargreaves.lansdown 0.1098 0.02244 4.894 1.08E-06 ***
imi 0.53225 0.16463 3.233 0.001247 **
johnson.matthey 0.10219 0.03344 3.056 0.002276 **
kingfisher -0.08054 0.09647 -0.835 0.403947
mondi -0.31752 0.06463 -4.913 9.77E-07 ***
pearson -1.06295 0.14024 -7.58 5.51E-14 ***
persimmon 0.23234 0.2629 0.884 0.37694
prudential 8.29241 0.45481 18.233 <2 .00e-16="" td=""> ***
rexam -0.18421 0.06884 -2.676 0.007523 **
rio.tinto 0.05902 0.03556 1.66 0.09716 .
royal.dutch.shell -0.32336 0.06729 -4.806 1.67E-06 ***
royal.mail 0.09751 0.03783 2.578 0.010026 *
schroders 0.13236 0.08896 1.488 0.136951

No comments:

Entertaining Blogs - BlogCatalog Blog Directory
Bloggtoppen.se