Saturday, December 20, 2014
Most googled words 2014
These are the five search terms with the biggest increase in interest during 2014. The new iPhone tops the list, followed by Disney's Frozen, the emerging markets classifieds company OLX, India's Flipkart and Google Drive.
Out of these search terms, OLX is has the highest search volume, while the new iPhone was by far the most Googled word out of the list during the month of the launch.
Labels:
2014,
flipkart,
frozen,
google drive,
google trends,
iphone 6,
list,
most googled,
olx
Monday, December 08, 2014
Creating daily search volume data from weekly and daily data using R
In my previous post, I explained the general principle behind using Google Trends' weekly and daily data to create daily time series longer than 90 days. Here, I provide the steps to take in R to achive the same reuslts.
#Start by copying these functions in R.
#Then run the following code:
#NB! In order for the code to run properly, you will have to specify the download directory of your default browser (downloadDir)
downloadDir="C:/downloads"
url=vector()
filePath=vector()
adjustedWeekly=data.frame()
keyword="google trends"
#Create URLs to daily data
for(i in 1:12){
url[i]=URL_GT(keyword, year=2013, month=i, length=1)
}
#Download
for(i in 1:length(url)){
filePath[i]=downloadGT(url[i], downloadDir)
}
dailyData=readGT(filePath)
dailyData=dailyData[order(dailyData$Date),]
#Get weekly data
url=URL_GT(keyword, year=2013, month=1, length=12)
filePath=downloadGT(url, downloadDir)
weeklyData=readGT(filePath)
adjustedDaily=dailyData[1:2]
adjustedDaily=merge(adjustedDaily, weeklyData[1:2], by="Date", all=T)
adjustedDaily[4:5]=NA
names(adjustedDaily)=c("Date", "Daily", "Weekly", "Adjustment_factor", "Adjusted_daily")
#Adjust for date missmatch
for(i in 1:nrow(adjustedDaily)){
if(is.na(adjustedDaily$Daily[i])) adjustedDaily$Daily[i]=adjustedDaily$Daily[i-1]
}
#Create adjustment factor
adjustedDaily$Adjustment_factor=adjustedDaily$Weekly/adjustedDaily$Daily
#Remove data before first available adjustment factor
start=which(is.finite(adjustedDaily$Adjustment_factor))[1]
stop=nrow(adjustedDaily)
adjustedDaily=adjustedDaily[start:stop,]
#Fill in missing adjustment factors
for(i in 1:nrow(adjustedDaily)){
if(is.na(adjustedDaily$Adjustment_factor[i])) adjustedDaily$Adjustment_factor[i]=adjustedDaily$Adjustment_factor[i-1]
}
#Calculated adjusted daily values
adjustedDaily$Adjusted_daily=adjustedDaily$Daily*adjustedDaily$Adjustment_factor
#Plot the results
library(ggplot2)
ggplot(adjustedDaily, aes(x=Date, y=Adjusted_daily))+geom_line(col="blue")+ggtitle("SVI for Google Trends")
#Start by copying these functions in R.
#Then run the following code:
#NB! In order for the code to run properly, you will have to specify the download directory of your default browser (downloadDir)
downloadDir="C:/downloads"
url=vector()
filePath=vector()
adjustedWeekly=data.frame()
keyword="google trends"
#Create URLs to daily data
for(i in 1:12){
url[i]=URL_GT(keyword, year=2013, month=i, length=1)
}
#Download
for(i in 1:length(url)){
filePath[i]=downloadGT(url[i], downloadDir)
}
dailyData=readGT(filePath)
dailyData=dailyData[order(dailyData$Date),]
#Get weekly data
url=URL_GT(keyword, year=2013, month=1, length=12)
filePath=downloadGT(url, downloadDir)
weeklyData=readGT(filePath)
adjustedDaily=dailyData[1:2]
adjustedDaily=merge(adjustedDaily, weeklyData[1:2], by="Date", all=T)
adjustedDaily[4:5]=NA
names(adjustedDaily)=c("Date", "Daily", "Weekly", "Adjustment_factor", "Adjusted_daily")
#Adjust for date missmatch
for(i in 1:nrow(adjustedDaily)){
if(is.na(adjustedDaily$Daily[i])) adjustedDaily$Daily[i]=adjustedDaily$Daily[i-1]
}
#Create adjustment factor
adjustedDaily$Adjustment_factor=adjustedDaily$Weekly/adjustedDaily$Daily
#Remove data before first available adjustment factor
start=which(is.finite(adjustedDaily$Adjustment_factor))[1]
stop=nrow(adjustedDaily)
adjustedDaily=adjustedDaily[start:stop,]
#Fill in missing adjustment factors
for(i in 1:nrow(adjustedDaily)){
if(is.na(adjustedDaily$Adjustment_factor[i])) adjustedDaily$Adjustment_factor[i]=adjustedDaily$Adjustment_factor[i-1]
}
#Calculated adjusted daily values
adjustedDaily$Adjusted_daily=adjustedDaily$Daily*adjustedDaily$Adjustment_factor
#Plot the results
library(ggplot2)
ggplot(adjustedDaily, aes(x=Date, y=Adjusted_daily))+geom_line(col="blue")+ggtitle("SVI for Google Trends")
Sunday, December 07, 2014
Creating daily search volume data from weekly and daily data
Risteski & Davcev (2014) refers to my method for getting Google Trends data for more than 90 days in their recent paper. For those interested, I've outlined the key principle in this post.
This post illustrates a method for combining daily and weekly search data from Google Trends in order to create a daily time series for over a period longer than the 90 days of daily data provided by Google Trends. For the sake of simplicity, I have only included two weeks, but the general principle can easily be extended to longer time series. SVI in the tables refers to Google Trends' search volume index.
For an example of how this can be done using R, look here.
This post illustrates a method for combining daily and weekly search data from Google Trends in order to create a daily time series for over a period longer than the 90 days of daily data provided by Google Trends. For the sake of simplicity, I have only included two weeks, but the general principle can easily be extended to longer time series. SVI in the tables refers to Google Trends' search volume index.
For an example of how this can be done using R, look here.
Step 1: Collect daily search data from Google Trends and combine it into one array.
Date | Daily SVI |
1.1.2014 | 72 |
2.1.2014 | 96 |
3.1.2014 | 16 |
4.1.2014 | 70 |
5.1.2014 | 61 |
6.1.2014 | 97 |
7.1.2014 | 44 |
8.1.2014 | 32 |
9.1.2014 | 8 |
10.1.2014 | 13 |
11.1.2014 | 67 |
12.1.2014 | 9 |
13.1.2014 | 63 |
14.1.2014 | 91 |
Step 2: Collect weekly search data over the same time period
Date | Daily SVI | Weekly SVI |
1.1.2014 | 72 | 20 |
2.1.2014 | 96 | |
3.1.2014 | 16 | |
4.1.2014 | 70 | |
5.1.2014 | 61 | |
6.1.2014 | 97 | |
7.1.2014 | 44 | |
8.1.2014 | 32 | 30 |
9.1.2014 | 8 | |
10.1.2014 | 13 | |
11.1.2014 | 67 | |
12.1.2014 | 9 | |
13.1.2014 | 63 | |
14.1.2014 | 91 |
Step 3: Adjust the daily data based on the weekly data
The key here is the adjustment factor. It is the weekly SVI divided by the daily SVI for those dates where there are values for both. For other data points, the last available adjustment factor is applied.
Date | Daily SVI | Weekly SVI | Adjustment factor | Adjusted values |
1.1.2014 | 72 | 20 | 0,3 | 20,0 |
2.1.2014 | 96 | 0,3 | 26,7 | |
3.1.2014 | 16 | 0,3 | 4,4 | |
4.1.2014 | 70 | 0,3 | 19,4 | |
5.1.2014 | 61 | 0,3 | 16,9 | |
6.1.2014 | 97 | 0,3 | 26,9 | |
7.1.2014 | 44 | 0,3 | 12,2 | |
8.1.2014 | 32 | 30 | 0,9 | 30,0 |
9.1.2014 | 8 | 0,9 | 7,5 | |
10.1.2014 | 13 | 0,9 | 12,2 | |
11.1.2014 | 67 | 0,9 | 62,8 | |
12.1.2014 | 9 | 0,9 | 8,4 | |
13.1.2014 | 63 | 0,9 | 59,1 | |
14.1.2014 | 91 | 0,9 | 85,3 |
Time series plot
As can be seen from the graph below, the inter-week changes remain the same, but is adjusted down or up based on the weekly search volumes. This way, the relative volumes between weeks is unchanged and the data is comparable over periods longer than the 90 days provided by Google Trends.
Data artifacts in daily Google Trends data
An important consideration when merging daily Google Trends data together is that the first and last day of a month sometime are shown as zero in the data. This can create serious inconsistencies in the data, as we cannot make a percentage adjustment to zero values. To account for this, I recommend that the the first and last 30 days from the time series is stripped out before merging the data.
To illustrate the issue, I've copied the daily Google Trends data for the search term "CAC40" below. The value for 1 January 2014 is zero in the upper graph, while it's 77 in the upper graph.
To illustrate the issue, I've copied the daily Google Trends data for the search term "CAC40" below. The value for 1 January 2014 is zero in the upper graph, while it's 77 in the upper graph.
January - February 2014
December 2013 - February 2014
David Leinweber writes more on the topic here.
Subscribe to:
Posts (Atom)