"This broad availability of data has led to increasing interest in methods for extracting useful information and knowledge from data—the realm of data science."
The ultimate goal of data science in business is to improve decision making. Here, I highlight some of the key points from the book Data Science for Business.
There are three types of concepts presented in Data Science for Business: how does data scientists fit into an organization, thinking data-analythically, and extracting knowledge from data. The book is structured around fundamental concepts of data scientists. Here are some of them:
- Determining similarity between entities. Finding similarities between for instance customers can be the basis for a predictive algorithm or for cluster analysis. It is also useful in information retreival (e.g. search) and recommendations.
- Lift: how much more prevalent a pattern is than expected by chance (i.e. what is the impact of an algorithm, all else being equal).
- Data based decision making falls into two categories: decisions made on discovery through data analysis, and repeated decisions on large scale.
- The process of extracting useful data can be treated systematically, using standards such as CRIPS-DM.
- Be careful about overfitting the data.
- Applying data mining to extract useful solutions requires thinking carefully about the context in which the data is used.
Where is data mining used?
- Analyze customer behavior
- Credit scoring
- Trading
- Fraud deteection
- Workforce management
- Supply-chain management
- Direct marketing
- Online advertising
- Credit scoring
- Help-desk management
- Search ranking
- Predicting changes in demand at Walmart when a hurricane hits
- Reducing customer churn
Studies show that data driven companies are more productive.
The data mining process
The analysis is applied to an entity of interest, such as a customer. The customer can be described by a number of attributes. Relevant attributes can be discovered by applying relevant theory (or model), or through an automated discovery process (model agnostic). A model agnostic approach is more likely to lead to overfitting.The future of big data
The authors compare the development of big data to that of the web. In the beginning, companies focused on putting the basic requriements in place to have a precense on the web. Once that was in place, companies started asking what additional benefit they could get out of the Internet. In the same way, many companies have put the basic data management tools in place, and are now asking themselves how they can take advantage of this resource.Once the capability to process big data sets are in place, companies should ask themselves what new opportunities this could bring.The book's website: http://www.data-science-for-biz.com/
No comments:
Post a Comment