Monday, December 08, 2014

Creating daily search volume data from weekly and daily data using R

In my previous post, I explained the general principle behind using Google Trends' weekly and daily data to create daily time series longer than 90 days. Here, I provide the steps to take in R to achive the same reuslts.


#Start by copying these functions in R. 
#Then run the following code:
#NB! In order for the code to run properly, you will have to specify the download directory of your default browser (downloadDir)

downloadDir="C:/downloads"

url=vector()
filePath=vector()
adjustedWeekly=data.frame()
keyword="google trends"



#Create URLs to daily data
for(i in 1:12){
    url[i]=URL_GT(keyword, year=2013, month=i, length=1)
}

#Download
for(i in 1:length(url)){
    filePath[i]=downloadGT(url[i], downloadDir)
}

dailyData=readGT(filePath)
dailyData=dailyData[order(dailyData$Date),]

#Get weekly data
url=URL_GT(keyword, year=2013, month=1, length=12)
filePath=downloadGT(url, downloadDir)
weeklyData=readGT(filePath)

adjustedDaily=dailyData[1:2]
adjustedDaily=merge(adjustedDaily, weeklyData[1:2], by="Date", all=T)
adjustedDaily[4:5]=NA
names(adjustedDaily)=c("Date", "Daily", "Weekly", "Adjustment_factor", "Adjusted_daily")

#Adjust for date missmatch
for(i in 1:nrow(adjustedDaily)){
    if(is.na(adjustedDaily$Daily[i])) adjustedDaily$Daily[i]=adjustedDaily$Daily[i-1]
}

#Create adjustment factor
adjustedDaily$Adjustment_factor=adjustedDaily$Weekly/adjustedDaily$Daily

#Remove data before first available adjustment factor
start=which(is.finite(adjustedDaily$Adjustment_factor))[1]
stop=nrow(adjustedDaily)
adjustedDaily=adjustedDaily[start:stop,]

#Fill in missing adjustment factors
for(i in 1:nrow(adjustedDaily)){
    if(is.na(adjustedDaily$Adjustment_factor[i])) adjustedDaily$Adjustment_factor[i]=adjustedDaily$Adjustment_factor[i-1]
}

#Calculated adjusted daily values
adjustedDaily$Adjusted_daily=adjustedDaily$Daily*adjustedDaily$Adjustment_factor


#Plot the results
library(ggplot2)
ggplot(adjustedDaily, aes(x=Date, y=Adjusted_daily))+geom_line(col="blue")+ggtitle("SVI for Google Trends")

21 comments:

Unknown said...

Erik, thanks for the new code which works great, except that you forgot to set the working directory

downloadDir="C:/Users/Dean/Downloads"
setwd(downloadDir)

Without that line of code, your program will not work because it won't be able to find the csv file

Unknown said...

Dear Erik

I slightly modified your code to download data from 2004 up to 2014, but after this file download

report (100).csv

is not able to proceed and it continuously asks for being saved in a separate directory (i.e. "Save as").

Do you think it is possible to modify your original functions to cancel the

report.csv file

as soon as its data are saved in R?

Regards, Dean

Unknown said...

For some reasons that I am trying to figure out, your code is unable to save
in the directory these files

report (101).csv
report (102).csv
...

and the like. I have tried to find whether you put some delimiter in your R functions, but I did not find any. Or, is it some limits of Google itself?

Unknown said...

I am thinking to remove the intermediate data using

file.remove(file.path(downloadDir, list.files(downloadDir)))

I hope in the next days I will find some (at least) intermediate solution

Unknown said...

This command is even more precise to eliminate only the csv files (and not th rest)

file.remove(Sys.glob("*.csv"))

ErikJJ said...

Hi Dean,

I haven't encountered that problem myself,and it sounds like an issue with the browser. Which one are you using?

I use Firefox together with tabmix plus that allows me to automate the downloads completely.

Unknown said...

I use Chrome

haohan said...

Dear Erik,
Thanks for the code. When I downloaded 1500 csv files, google kept saying "you have reached your quota limit" and wouldn't allow me to download. I have to wait until the next day to continue. Since I have about 70,000 files to download, could you provide a solution for that? Thank you!

Unknown said...

Dear Haohan

that limit cannot be overcome. The only thing you can do is to create enough Google accounts to download your desired number of CSV files

Regards, Dean

ErikJJ said...

I'm afraid that having multiple Google accounts might not be enough either. I've used a VPN myself to double the file limit. Unfortunately there's no really good way to get aorund this problem.

Haohan said...

Dear Erik,
Yes,changing account cannot solve the problem. Actually, I try to use different accounts in different computers in our library but it is still blocked by google. It seems that once an IP address reached its limit, the other similar IPs will also be blocked by google. I can manually download the data on the other computers but I cannot use your code to download. Could you provide more detailed information about how to use VPN to overcome the problem? Thank you!

Marco [マルコ] said...

Hi everyone!!! I have a question: Do anyone know what to modify in order to mix the data of not just one year, but of 10?
And also, besides the plot that is finally obtained, how could I obtained a tables with the final and adjusted values that were plotted.
Thanks a lot in advance for your answer!
*PS: I am just and R beginner so I have many doubts on this :o

Marco [マルコ] said...

Hi everyone!!! I have a question: Do anyone know what to modify in order to mix the data of not just one year, but of 10?
And also, besides the plot that is finally obtained, how could I obtained a tables with the final and adjusted values that were plotted.
Thanks a lot in advance for your answer!
*PS: I am just and R beginner so I have many doubts on this :o

ErikJJ said...
This comment has been removed by the author.
ErikJJ said...

For those having trouble downloading more than 100 files using Chrome, I suggest changing your browser to Firefox. You can do it in R using this line of code:

options(browser="/usr/bin/open -a Safari")

Anonymous said...

Is there a possibility to make a loop to download daily data for a search query from 2004 to present?

ErikJJ said...

Yeah, that's easy with R. Just create a sequence of months (1,4,7,10) and years, and run the URL_GT function through all of those.

adeel said...

CAN YOU GIVE ME CODE TO SELECT A DAILY TIME SERIES FROM 8/8/2008 TO DATE FOR ONLY USA WITH A KEYWORD DJI;HOW TO ADD THE FUNCTION;GIVE ME CODE IN SINGLE BLOCK AND I LL CHANGE IT FOR OTHER QUERIES.CAN YOU DO THAT?

Elizabeth said...

I was so anxiuos to know what my husband was always doing late outside the house so i started contacting hackers and was scamed severly until i almost gave up then i contacted this one hacker and he delivered a good job showing evidences i needed from the apps on his phone like whatsapp,facebook,instagram and others and i went ahead to file my divorce papers with the evidences i got,He also went ahead to get me back some of my lost money i sent to those other fake hackers,every dollar i spent on these jobs was worth it.Contact him so he also help you.
mail: premiumhackservices@gmail.com
text or call +1 4016006790

Anonymous said...

HELLO VIEWERS

TESTIMONY ON HOW I GOT MY LOAN €300,000.00EURO FROM A FINANCE COMPANY LAST WEEK Email for immediate response: drbenjaminfinance@gmail.com

Do you need a loan to start a business or pay your debts {Dr.Benjamin Scarlett Owen} can also help you with a legit loan offer. He Has also helped some other colleagues of mine with a loan finance. Get your Blank ATM card or CREDIT CARD deliver to your doorstep that works in all ATM machines all over the world with the help of BENJAMIN LOAN FINANCE the ATM cards can be used to withdraw at the ATM Machines or swipe, at stores and POS. they give out this cards to all interested clients worldwide, If you need a loan without cost/stress he his the right loan lender to wipe away your financial problems and crisis today. BENJAMIN LOAN FINANCE holds all of the information about how to obtain money quickly and painlessly via Email drbenjaminfinance@gmail.com

Unknown said...

I tried to use this code to get data but it didn't work. Using a firefox browser,
I got this message on firefox

404. That’s an error.

The requested URL was not found on this server. That’s all we know.

help please
Thanks

Entertaining Blogs - BlogCatalog Blog Directory
Bloggtoppen.se