Monday, July 28, 2014

Predicting iPhone sales with Google Trends

With the paper "Predicting the Present" by Hal Varian and Hyunyoung Choi in mind, i plotted  average quarterly Google searches for "iPhone" against quarterly sales. The results seem to indicate that Google searches is a leading indicator of sales by one quarter, although the correlation is not perfect.

On request, here is a scatteprlot of the two variables. The quarterly change in average Google searches is on the x-axis and quarterly change in iPhone sales is on the y-axis. In the first graph, there is no lag and the relatoinship is negative. If SVI preceeds sales with one quarter there is a positive relationship.

What could this data say about the Apple Watch? It doesn't look good.



Friday, July 25, 2014

Jarque-Bera test comparison of residuals from GARCH model for FTSE 100

Below is the daily returns for FTSe 100 and the residuals from the GARCH(1, 1) model. As can be seen from the chart, heteroskedasticity as been removed from the residuals.

The histogram does not show any significant improvement in the normality of the residual. The Jarque-Bera test reveals that the normality of the data has been markedly improved. The chi-squared has decreased from 8854 to 131. The null of normality must however still be rejected. The Box-Ljung test for the model gives a p-value of 0,14. We can therefore choose not to reject the null-hypothesis of independent distributions of the error term.




Thursday, July 24, 2014

GARCH modelling for FTSE 100

I follow the instructions in this article from R-bloggers. I will be using the R libarary rugarch.

Volatility in financial time series tend to cluster. Since one of the conditions for carrying out a regresson analysis using OLS is homoskedasticity, this is creates a problem. We can solve this by modelling the changing volatility using Generalized Autoregressive Conditional Heteroskedasticity (GARCH).

As is suggested by R-bloggers, I will use daily data for FTSE 100 returns. I use adjusted daily prices from Google Finance. Here is the raw data:
A generally accepted explanation of this volatility clustering is that information arrives in chunks to the market and that this drives price reactions. It is these chocks that we seek to model using GARCH. By taking a look at the returns of FTSE 100 since 2004, it is clear that there was a big volatility spike in the end of 2008 durin gthe outbreak of the financial crisis.

How much data should be used for estimation? R-blogger writes that ideally, tens of thousands of observations, but 2000 is not unreasonable. I have 2608 observations. What GARCH model should be used? I will go with GARCH(1, 1) as recommended by R-blogger. Financial time series typically don't have a normal distribution. we could hope that this is entirely due to the GARCH effect. In practice, a t-distribution performs better.

Following the instructinos on R-blogger, I do the following:

> gspec.ru <- mean.model="list(<br" ugarchspec="">      armaOrder=c(0,0)), distribution="std")
> gfit.ru <- br="" eturn="" ftse="" gspec.ru="" ugarchfit="">> coef(gfit.ru)

I then plot the in-sample volatility estimate by writing:

> plot(sqrt(252) * gfit.ru@fit$sigma, type='l')
  


ARIMA model for the FTSE 100 index in R

The index value for FTSE 100 is downloaded from Google Finance using the R package Quantmod. Here is the price data for the period 2004-2013.



Next, I calculate the return on the index. The return seems to be stationary around zero, with a significant increase in volatility during the end of 2008.


A histogram of the price:
And of the return:
The return has a mean close to zero, a negative skewness of -0,16 and a very high kurtosis, 9,01. The descriptive statistics is calculated using the describe function from the psych library. The Jarque-Bera test rejects the null hypothesis of normality. The Jarque-Bera test is calculated using the tseries library.


Observations 2608
Mean 0
Standard deviation 0,01
Median 0
Min -0,9
Max 0,9
Skewness -0,16
Kurtosis 9,01

The autocorrelation function shows a that there is a significant amount of autocorrelation in the time series, although the magnitude of the autocorrelation is small.

The partial autocorrelation fuction is also significant for the first four lags. Even though the autocorrelations are significant, they are quite small.

What ARIMA model should we use to correct for the autocorrelation? A good description of the model selection can be found here.
The autocorrelation function (ACF) plays the same role for MA terms that the PACF plays for AR terms--that is, the ACF tells you how many MA terms are likely to be needed to remove the remaining autocorrelation from the differenced series. If the autocorrelation is significant at lag k but not at any higher lags--i.e., if the ACF "cuts off" at lag k--this indicates that exactly k MA terms should be used in the forecasting equation. In the latter case, we say that the stationarized series displays an "MA signature," meaning that the autocorrelation pattern can be explained more easily by adding MA terms than by adding AR terms.
Based on the ACF and partial ACF plots, it seems like we should use an ARIMA(4, 0, 4) model.

Compared to the disturbution of the plain returns, the normality of the error terms seems to have improved. The kurtosis has descreased somewhat, but the skewness has actually gone up. The Jarque-Bera test is somewhat improved. The X-squared is 7168, compared to 8855 of the plain returns.



ReturnsARIMA(4,0,4)
Observations26082608
Mean0,02 %0,02 %
Standard deviation1,19 %1,12 %
Median0,02 %0,06 %
Min-0,9-0,9
Max0,90,9
Skewness-0,16-0,4
Kurtosis9,018,07

And finally, here is the autocorrelation function for the residuals of the ARIMA(4,0,4) specification. As you can see, there are still some significant autocorrelations starting at the 25th lag.


Tuesday, July 22, 2014

Return and volatility distribution, FTSE 100

This is what the daily logistic returns on the index FTSE 100 has looked like since 2004.

 
 
And here is the volatility.
 
 

Friday, July 18, 2014

Google Trends and stock indexes


In order to use Google Trends data in financial analysis and forecasting, it is necessary to understand how to manipulate the data in the right way. Here, I demonstrate that there is a big difference between the average search pattern of index constituents and for the index itself. I.e. averageing search data for companies in FTSE 100 index gives a very different pattern than the querry for "FTSE 100".
Individual companies can have a high degree of correlation with the search data for the index, as is shown in the table at the bottom. On the whole there does not seem to be a direct relationship between the two.

I have collected a data set consisting of the daily searches for stocks on FTSE 100 since 2004.This is what daily search activity for the querry "FTSE 100" looks like:

Out of the 100 companies on FTSE 100, there is Google Trends data for 36 of them with somewhat complete data for the entire time period. First, lets take a look at what the average search activity has looked like:


From the graph it is clear that the simple average SVI for the index constituents is quite different from querries for the index itself. An OLS regression between the two confirms the point:
But how much of the variance in the index search querry can be explained with the searches for the constituents? Using a multiple regression where the search querry for "FTSE 100" is the dependent variable, and the 36 companies I have data for are the independent variables, I get the following result:


The upper graph is the result when the constituent data has been weighted by their coefficient from the regression analysis. The lower graph is the normal data. The adjusted R^2 for the regression is 48.8%, which is quite low if we assume that the search querries for the index is determined by its constituents. Based on these graphs we can conclude that the combined search querries for index constituents have a very different pattern from the index itself.

In the following post I will compare how the averaged data compares to the index querry when it comes to explaining volatility and volume on an index level.



Estimate Std. Error t value Pr
(Intercept) -45.9945 10.70911 -4.295 1.84E-05 ***
antofagasta -0.08598 0.10204 -0.843 0.399576
barclays -0.79759 0.13393 -5.955 3.11E-09 ***
bg.group 0.07388 0.04555 1.622 0.104973
bhp.billiton 0.25362 0.04331 5.856 5.63E-09 ***
bp 0.27475 0.07597 3.617 0.000307 ***
british.american.tobacco -0.02044 0.06471 -0.316 0.75208
bt.group 0.07351 0.0421 1.746 0.080954 .
bunzl -0.0606 0.04473 -1.355 0.175677
capita 0.19081 0.0597 3.196 0.001418 **
carnival -0.33172 0.08659 -3.831 0.000132 ***
centrica 0.00176 0.03652 0.048 0.961554
coca.cola 0.12069 0.05971 2.021 0.043406 *
compass.group -0.05013 0.04275 -1.173 0.241132
crh -0.05094 0.02877 -1.771 0.076751 .
diageo -0.07566 0.0308 -2.456 0.014131 *
easyjet -0.35516 0.09915 -3.582 0.00035 ***
experian 0.84559 0.12022 7.034 2.85E-12 ***
fresnillo -0.09778 0.05871 -1.666 0.095972 .
g4s 0.09061 0.06033 1.502 0.133294
gkn 0.04686 0.04956 0.946 0.344499
glaxosmithkline 0.09202 0.06327 1.454 0.146051
hargreaves.lansdown 0.1098 0.02244 4.894 1.08E-06 ***
imi 0.53225 0.16463 3.233 0.001247 **
johnson.matthey 0.10219 0.03344 3.056 0.002276 **
kingfisher -0.08054 0.09647 -0.835 0.403947
mondi -0.31752 0.06463 -4.913 9.77E-07 ***
pearson -1.06295 0.14024 -7.58 5.51E-14 ***
persimmon 0.23234 0.2629 0.884 0.37694
prudential 8.29241 0.45481 18.233 <2 .00e-16="" td=""> ***
rexam -0.18421 0.06884 -2.676 0.007523 **
rio.tinto 0.05902 0.03556 1.66 0.09716 .
royal.dutch.shell -0.32336 0.06729 -4.806 1.67E-06 ***
royal.mail 0.09751 0.03783 2.578 0.010026 *
schroders 0.13236 0.08896 1.488 0.136951

Correlation between Google Trends and stock market volatility

Andersen (1996) writes that "price movements are caused primarely by the arrival of new information and the process that incoprates this information into market prices". This underlying information process affects stock market volatility and trade volumes. If Google Trends data represents the general flow of information then we should see a correlation between Google Trends data, volatility and volume.

The data set consists of daily Google Trends observations for the querry "FTSE 100" and daily returns, volatility and volume for FTSE 100. The correlation matrix shows that there is a high correlation between volatility and Google Trends data for the FTSE 100 data set.The correlation between the Google Trends data and volatility is surprisingly low, as is the correlation between volume and volatility.



Close Trends Return Volatility Volume
Close 1.00 -0.28 0.04 -0.24 -0.10
Trends 1.00 -0.06 0.48 -0.01
Return 1.00 0.03 -0.04
Volatility 1.00 0.16
Volume 1.00
Correlation matrix, FTSE 100





 A univariate OLS regression where volatility is the dependent variable and Google Trends is the independent variable gives a significant slope on a 0.1% level and an R^2 of 23,51%.
 



Estimate Std. Error  T value Pr(>|t|)
(Intercept) 0.00 0.00 -7.59 0.00 ***
Trends 0.00 0.00 27.28 2e-16***

Thursday, July 17, 2014

Indexing daily Google Trends data using R

If you want to us daily Google Trends data for statistical analysis, you will most likely want to go beyond the 90 day limitation Google has set on the data. How would you do that? You might start by simply piecing together the data Google gives you.

I've downloaded the csv file each month between 2004-2013 for the querry "FTSE 100". This is what you get:
Not very pretty is it. If you compare it to the weekly Google Trends data for the same querry, you can see that they look quite different from each other.


 Ideally, we would like to get the increased accuracy of daily data, while keeping the indexing intact. The way I've solved this problem is by reindexing the data back to the weekly index. Each weekly observation provides a reindexing point. Then you get this:

Looks a lot better, right? Here's two more examples, BP and Experian.


How much does Google Trends data change over time?


 Will Google Trends data change depending on when it is downloaded? To answer the question, I downloaded data for the UK stock index FTSE 100 with two months in between.
Since Google Trends create an index from the search querries, the result may change over time. To test how much this change might be, I've downloaded the querry for "FTSE 100" twice, once in May and once in July. As can be seen from the image, the results differ slightly
The R^2 for a linear regression with the two time series is 96,6%. On a short time horizon, there is only minor changes in the data.
Entertaining Blogs - BlogCatalog Blog Directory
Bloggtoppen.se