Pages

Monday, December 08, 2014

Creating daily search volume data from weekly and daily data using R

In my previous post, I explained the general principle behind using Google Trends' weekly and daily data to create daily time series longer than 90 days. Here, I provide the steps to take in R to achive the same reuslts.


#Start by copying these functions in R. 
#Then run the following code:
#NB! In order for the code to run properly, you will have to specify the download directory of your default browser (downloadDir)

downloadDir="C:/downloads"

url=vector()
filePath=vector()
adjustedWeekly=data.frame()
keyword="google trends"



#Create URLs to daily data
for(i in 1:12){
    url[i]=URL_GT(keyword, year=2013, month=i, length=1)
}

#Download
for(i in 1:length(url)){
    filePath[i]=downloadGT(url[i], downloadDir)
}

dailyData=readGT(filePath)
dailyData=dailyData[order(dailyData$Date),]

#Get weekly data
url=URL_GT(keyword, year=2013, month=1, length=12)
filePath=downloadGT(url, downloadDir)
weeklyData=readGT(filePath)

adjustedDaily=dailyData[1:2]
adjustedDaily=merge(adjustedDaily, weeklyData[1:2], by="Date", all=T)
adjustedDaily[4:5]=NA
names(adjustedDaily)=c("Date", "Daily", "Weekly", "Adjustment_factor", "Adjusted_daily")

#Adjust for date missmatch
for(i in 1:nrow(adjustedDaily)){
    if(is.na(adjustedDaily$Daily[i])) adjustedDaily$Daily[i]=adjustedDaily$Daily[i-1]
}

#Create adjustment factor
adjustedDaily$Adjustment_factor=adjustedDaily$Weekly/adjustedDaily$Daily

#Remove data before first available adjustment factor
start=which(is.finite(adjustedDaily$Adjustment_factor))[1]
stop=nrow(adjustedDaily)
adjustedDaily=adjustedDaily[start:stop,]

#Fill in missing adjustment factors
for(i in 1:nrow(adjustedDaily)){
    if(is.na(adjustedDaily$Adjustment_factor[i])) adjustedDaily$Adjustment_factor[i]=adjustedDaily$Adjustment_factor[i-1]
}

#Calculated adjusted daily values
adjustedDaily$Adjusted_daily=adjustedDaily$Daily*adjustedDaily$Adjustment_factor


#Plot the results
library(ggplot2)
ggplot(adjustedDaily, aes(x=Date, y=Adjusted_daily))+geom_line(col="blue")+ggtitle("SVI for Google Trends")

21 comments:

  1. Erik, thanks for the new code which works great, except that you forgot to set the working directory

    downloadDir="C:/Users/Dean/Downloads"
    setwd(downloadDir)

    Without that line of code, your program will not work because it won't be able to find the csv file

    ReplyDelete
  2. Dear Erik

    I slightly modified your code to download data from 2004 up to 2014, but after this file download

    report (100).csv

    is not able to proceed and it continuously asks for being saved in a separate directory (i.e. "Save as").

    Do you think it is possible to modify your original functions to cancel the

    report.csv file

    as soon as its data are saved in R?

    Regards, Dean

    ReplyDelete
  3. For some reasons that I am trying to figure out, your code is unable to save
    in the directory these files

    report (101).csv
    report (102).csv
    ...

    and the like. I have tried to find whether you put some delimiter in your R functions, but I did not find any. Or, is it some limits of Google itself?

    ReplyDelete
  4. I am thinking to remove the intermediate data using

    file.remove(file.path(downloadDir, list.files(downloadDir)))

    I hope in the next days I will find some (at least) intermediate solution

    ReplyDelete
  5. This command is even more precise to eliminate only the csv files (and not th rest)

    file.remove(Sys.glob("*.csv"))

    ReplyDelete
  6. Hi Dean,

    I haven't encountered that problem myself,and it sounds like an issue with the browser. Which one are you using?

    I use Firefox together with tabmix plus that allows me to automate the downloads completely.

    ReplyDelete
  7. haohan6:05 am

    Dear Erik,
    Thanks for the code. When I downloaded 1500 csv files, google kept saying "you have reached your quota limit" and wouldn't allow me to download. I have to wait until the next day to continue. Since I have about 70,000 files to download, could you provide a solution for that? Thank you!

    ReplyDelete
  8. Dear Haohan

    that limit cannot be overcome. The only thing you can do is to create enough Google accounts to download your desired number of CSV files

    Regards, Dean

    ReplyDelete
  9. I'm afraid that having multiple Google accounts might not be enough either. I've used a VPN myself to double the file limit. Unfortunately there's no really good way to get aorund this problem.

    ReplyDelete
  10. Haohan1:20 pm

    Dear Erik,
    Yes,changing account cannot solve the problem. Actually, I try to use different accounts in different computers in our library but it is still blocked by google. It seems that once an IP address reached its limit, the other similar IPs will also be blocked by google. I can manually download the data on the other computers but I cannot use your code to download. Could you provide more detailed information about how to use VPN to overcome the problem? Thank you!

    ReplyDelete
  11. Hi everyone!!! I have a question: Do anyone know what to modify in order to mix the data of not just one year, but of 10?
    And also, besides the plot that is finally obtained, how could I obtained a tables with the final and adjusted values that were plotted.
    Thanks a lot in advance for your answer!
    *PS: I am just and R beginner so I have many doubts on this :o

    ReplyDelete
  12. Hi everyone!!! I have a question: Do anyone know what to modify in order to mix the data of not just one year, but of 10?
    And also, besides the plot that is finally obtained, how could I obtained a tables with the final and adjusted values that were plotted.
    Thanks a lot in advance for your answer!
    *PS: I am just and R beginner so I have many doubts on this :o

    ReplyDelete
  13. This comment has been removed by the author.

    ReplyDelete
  14. For those having trouble downloading more than 100 files using Chrome, I suggest changing your browser to Firefox. You can do it in R using this line of code:

    options(browser="/usr/bin/open -a Safari")

    ReplyDelete
  15. Anonymous11:20 am

    Is there a possibility to make a loop to download daily data for a search query from 2004 to present?

    ReplyDelete
  16. Yeah, that's easy with R. Just create a sequence of months (1,4,7,10) and years, and run the URL_GT function through all of those.

    ReplyDelete
  17. CAN YOU GIVE ME CODE TO SELECT A DAILY TIME SERIES FROM 8/8/2008 TO DATE FOR ONLY USA WITH A KEYWORD DJI;HOW TO ADD THE FUNCTION;GIVE ME CODE IN SINGLE BLOCK AND I LL CHANGE IT FOR OTHER QUERIES.CAN YOU DO THAT?

    ReplyDelete
  18. I was so anxiuos to know what my husband was always doing late outside the house so i started contacting hackers and was scamed severly until i almost gave up then i contacted this one hacker and he delivered a good job showing evidences i needed from the apps on his phone like whatsapp,facebook,instagram and others and i went ahead to file my divorce papers with the evidences i got,He also went ahead to get me back some of my lost money i sent to those other fake hackers,every dollar i spent on these jobs was worth it.Contact him so he also help you.
    mail: premiumhackservices@gmail.com
    text or call +1 4016006790

    ReplyDelete
  19. Anonymous6:49 pm

    HELLO VIEWERS

    TESTIMONY ON HOW I GOT MY LOAN €300,000.00EURO FROM A FINANCE COMPANY LAST WEEK Email for immediate response: drbenjaminfinance@gmail.com

    Do you need a loan to start a business or pay your debts {Dr.Benjamin Scarlett Owen} can also help you with a legit loan offer. He Has also helped some other colleagues of mine with a loan finance. Get your Blank ATM card or CREDIT CARD deliver to your doorstep that works in all ATM machines all over the world with the help of BENJAMIN LOAN FINANCE the ATM cards can be used to withdraw at the ATM Machines or swipe, at stores and POS. they give out this cards to all interested clients worldwide, If you need a loan without cost/stress he his the right loan lender to wipe away your financial problems and crisis today. BENJAMIN LOAN FINANCE holds all of the information about how to obtain money quickly and painlessly via Email drbenjaminfinance@gmail.com

    ReplyDelete
  20. I tried to use this code to get data but it didn't work. Using a firefox browser,
    I got this message on firefox

    404. That’s an error.

    The requested URL was not found on this server. That’s all we know.

    help please
    Thanks

    ReplyDelete