Monday, May 19, 2014

Google Trends mass import using R

I was looking for a tool for mass download of daily Google Trends data for my thesis and couldn't find anything that worked for my needs. The tool was built to download Google Trends data in monthly snippets for  a given search word and given year or years.

I thought I would share my work here in case someone else has stumbled on the same issues. The file can be found here on github: https://gist.github.com/321k/823cce9769e58bc14214.

Make sure you have the quantmod library installed before you run it.

Getting the data

Since the code works by downloading the file thorugh the browser, it is not affected by changes to Google's authentication policy. Simply make sure that you are signed in to your Google account. This does however mean that the data download is slow. I recommend using Firefox with a tab manager to close the tabs after a download has been completed. Tab mix plus 0.4.1.3.1 works great for me. You will also need to check the box in the download prompt that lets Firefox download without prompting you. Finally, you need to specify the download directory as an empty folder.

There are four functions; downloadGT, importGT, formatGT, and mergeGT. downloadGT takes two imputs, the years you want to download, and the search querry you want to get. To run for multiple querries, simply add a loop:
querries=c("MSFT", "AAPL")
years=c("2012", "2013")
for(i in 1:length(querries) {downloadGT(years, querries[i])}

 Formating the data

Once we have downloaded the files, we need to import it to R and put it in a useable format. importGT gets the data, formatGT extracts the time series data, and mergeGT put the individual months into complete time series by company.
path="C:/data"
rawData=importGT(path)
formatedData=formatGT(rawData)
mergedData=mergeGT(formatedData)
And there you have it. I'm currently in the process of downloading the Google Trends data for FTSE 100 between 2004 and 2013. Google's quota limit allows me to download about ten companies per day, or 1200 files. Feel free to leave a comment or suggestion
.

No comments:

Entertaining Blogs - BlogCatalog Blog Directory
Bloggtoppen.se