Saturday, December 20, 2014

Most googled words 2014

These are the five search terms with the biggest increase in interest during 2014. The new iPhone tops the list, followed by Disney's Frozen, the emerging markets classifieds company OLX, India's Flipkart and Google Drive. Out of these search terms, OLX is has the highest search volume, while the new iPhone was by far the most Googled word out of the list during the month of the launch.

Monday, December 08, 2014

Creating daily search volume data from weekly and daily data using R

In my previous post, I explained the general principle behind using Google Trends' weekly and daily data to create daily time series longer than 90 days. Here, I provide the steps to take in R to achive the same reuslts.


#Start by copying these functions in R. 
#Then run the following code:
#NB! In order for the code to run properly, you will have to specify the download directory of your default browser (downloadDir)

downloadDir="C:/downloads"

url=vector()
filePath=vector()
adjustedWeekly=data.frame()
keyword="google trends"



#Create URLs to daily data
for(i in 1:12){
    url[i]=URL_GT(keyword, year=2013, month=i, length=1)
}

#Download
for(i in 1:length(url)){
    filePath[i]=downloadGT(url[i], downloadDir)
}

dailyData=readGT(filePath)
dailyData=dailyData[order(dailyData$Date),]

#Get weekly data
url=URL_GT(keyword, year=2013, month=1, length=12)
filePath=downloadGT(url, downloadDir)
weeklyData=readGT(filePath)

adjustedDaily=dailyData[1:2]
adjustedDaily=merge(adjustedDaily, weeklyData[1:2], by="Date", all=T)
adjustedDaily[4:5]=NA
names(adjustedDaily)=c("Date", "Daily", "Weekly", "Adjustment_factor", "Adjusted_daily")

#Adjust for date missmatch
for(i in 1:nrow(adjustedDaily)){
    if(is.na(adjustedDaily$Daily[i])) adjustedDaily$Daily[i]=adjustedDaily$Daily[i-1]
}

#Create adjustment factor
adjustedDaily$Adjustment_factor=adjustedDaily$Weekly/adjustedDaily$Daily

#Remove data before first available adjustment factor
start=which(is.finite(adjustedDaily$Adjustment_factor))[1]
stop=nrow(adjustedDaily)
adjustedDaily=adjustedDaily[start:stop,]

#Fill in missing adjustment factors
for(i in 1:nrow(adjustedDaily)){
    if(is.na(adjustedDaily$Adjustment_factor[i])) adjustedDaily$Adjustment_factor[i]=adjustedDaily$Adjustment_factor[i-1]
}

#Calculated adjusted daily values
adjustedDaily$Adjusted_daily=adjustedDaily$Daily*adjustedDaily$Adjustment_factor


#Plot the results
library(ggplot2)
ggplot(adjustedDaily, aes(x=Date, y=Adjusted_daily))+geom_line(col="blue")+ggtitle("SVI for Google Trends")

Sunday, December 07, 2014

Creating daily search volume data from weekly and daily data

Risteski & Davcev (2014) refers to my method for getting Google Trends data for more than 90 days in their recent paper. For those interested, I've outlined the key principle in this post.

This post illustrates a method for combining daily and weekly search data from Google Trends in order to create a daily time series for over a period longer than the 90 days of daily data provided by Google Trends. For the sake of simplicity, I have only included two weeks, but the general principle can easily be extended to longer time series. SVI in the tables refers to Google Trends' search volume index.

For an example of how this can be done using R, look here.

Step 1: Collect daily search data from Google Trends and combine it into one array.

Date Daily SVI
1.1.2014 72
2.1.2014 96
3.1.2014 16
4.1.2014 70
5.1.2014 61
6.1.2014 97
7.1.2014 44
8.1.2014 32
9.1.2014 8
10.1.2014 13
11.1.2014 67
12.1.2014 9
13.1.2014 63
14.1.2014 91

Step 2: Collect weekly search data over the same time period

Date Daily SVI Weekly SVI
1.1.2014 72 20
2.1.2014 96
3.1.2014 16
4.1.2014 70
5.1.2014 61
6.1.2014 97
7.1.2014 44
8.1.2014 32 30
9.1.2014 8
10.1.2014 13
11.1.2014 67
12.1.2014 9
13.1.2014 63
14.1.2014 91

Step 3: Adjust the daily data based on the weekly data

The key here is the adjustment factor. It is the weekly SVI divided by the daily SVI for those dates where there are values for both. For other data points, the last available adjustment factor is applied.

Date Daily SVI Weekly SVI Adjustment factor Adjusted values
1.1.2014 72 20 0,3 20,0
2.1.2014 96 0,3 26,7
3.1.2014 16 0,3 4,4
4.1.2014 70 0,3 19,4
5.1.2014 61 0,3 16,9
6.1.2014 97 0,3 26,9
7.1.2014 44 0,3 12,2
8.1.2014 32 30 0,9 30,0
9.1.2014 8 0,9 7,5
10.1.2014 13 0,9 12,2
11.1.2014 67 0,9 62,8
12.1.2014 9 0,9 8,4
13.1.2014 63 0,9 59,1
14.1.2014 91 0,9 85,3

Time series plot

As can be seen from the graph below, the inter-week changes remain the same, but is adjusted down or up based on the weekly search volumes. This way, the relative volumes between weeks is unchanged and the data is comparable over periods longer than the 90 days provided by Google Trends.

Data artifacts in daily Google Trends data

An important consideration when merging daily Google Trends data together is that the first and last day of a month sometime are shown as zero in the data. This can create serious inconsistencies in the data, as we cannot make a percentage adjustment to zero values. To account for this, I recommend that the the first and last 30 days from the time series is stripped out before merging the data.

To illustrate the issue, I've copied the daily Google Trends data for the search term "CAC40" below. The value for 1 January 2014 is zero in the upper graph, while it's 77 in the upper graph.

January - February 2014


December 2013 - February 2014



David Leinweber writes more on the topic here.
Entertaining Blogs - BlogCatalog Blog Directory
Bloggtoppen.se