Monday, November 24, 2014

Currently reading: Data Science for Business


"This broad availability of data has led to increasing interest in methods for extracting useful information and knowledge from data—the realm of data science."

The ultimate goal of data science in business is to improve decision making. Here, I highlight some of the key points from the book Data Science for Business.

There are three types of concepts presented in Data Science for Business: how does data scientists fit into an organization, thinking data-analythically, and extracting knowledge from data. The book is structured around fundamental concepts of data scientists. Here are some of them:

  • Determining similarity between entities. Finding similarities between for instance customers can be the basis for a predictive algorithm or for cluster analysis. It is also useful in information retreival (e.g. search) and recommendations.
  • Lift: how much more prevalent a pattern is than expected by chance (i.e. what is the impact of an algorithm, all else being equal).
  • Data based decision making falls into two categories: decisions made on discovery through data analysis, and repeated decisions on large scale.
  • The process of extracting useful data can be treated systematically, using standards such as CRIPS-DM.
  • Be careful about overfitting the data.
  • Applying data mining to extract useful solutions requires thinking carefully about the context in which the data is used.
More to come as I read through the book.

Where is data mining used?
  • Analyze customer behavior
  • Credit scoring
  • Trading
  • Fraud deteection
  • Workforce management
  • Supply-chain management
  • Direct marketing
  • Online advertising
  • Credit scoring
  • Help-desk management
  • Search ranking
Examples
  • Predicting changes in demand at Walmart when a hurricane hits
  • Reducing customer churn
Before deciding to use data mining and algorithms, it is important for first define what the deliverable of that analysis should be. By first asking why, it is easier to then tackle the how.

Studies show that data driven companies are more productive.

The data mining process

The analysis is applied to an entity of interest, such as a customer. The customer can be described by a number of attributes. Relevant attributes can be discovered by applying relevant theory (or model), or through an automated discovery process (model agnostic). A model agnostic approach is more likely to lead to overfitting.

The future of big data

The authors compare the development of big data to that of the web. In the beginning, companies focused on putting the basic requriements in place to have a precense on the web. Once that was in place, companies started asking what additional benefit they could get out of the Internet. In the same way, many companies have put the basic data management tools in place, and are now asking themselves how they can take advantage of this resource.Once the capability to process big data sets are in place, companies should ask themselves what new opportunities this could bring.

The book's website: http://www.data-science-for-biz.com/

No comments:

Entertaining Blogs - BlogCatalog Blog Directory
Bloggtoppen.se