Subscribe to DSC Newsletter

Differences between Data Mining and Predictive Analytics

What is Data Mining?

Data mining is an integrated application in the Data Warehouse and describes a systematic process for pattern recognition in large data sets to identify conclusions and relationships. Using statistical methods, or genetic algorithms, data files can be automatically searched for statistical anomalies, patterns or rules.

Wikipedia defines Data Mining as “Data mining is an interdisciplinary subfield of computer science. It is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.”

Data mining is a new approach to data:

  • Data Mining is not a simple use of statistical formulas.
  • Data Mining is part of a key process to collect and use data.
  • Data Mining is not just Excel spreadsheets with simple fields
  • Data Mining is a recovery of data by computer and statistical techniques.

What are the practical applications with Data Mining?

  • Automated prediction of trends and behaviors
  • Automated discovery of unknown models

Data Mining is said to be the 8 Data Analysis Techniques Every Manager Should Understand: 

  • Correlation Analysis
  • Regression Analysis
  • Data Visualization
  • Scenario Analysis
  • Data mining
  • Monte Carlo Simulation
  • Neural Networks
  • A/B Testing

What are the data mining parameters?

  • Association - the search for patterns in which an event is connected to another event;
  • Sequence or path analysis - looking for patterns where one event to another, later event leads;
  • Classification - the search for new patterns (which leads eventually to the fact that the nature of changes, how the data is organized);
  • Clustering - finding and visual documentation of previously unknown facts groups;
  • Prediction - discovering patterns in data that can lead to meaningful predictions about the future (the area of data mining is also called predictive analytics refers).

What is Predictive Analytics?

According to Wikipedia, “Predictive analytics encompasses a variety of statistical techniques from predictive modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events.”

What’s behind Predictive Analytics?

Prerequisite for Predictive Analytics is the collection of large, partly unstructured data from different sources. The combination of different data sources such as weather, traffic and social media data, enriched by internal data is particularly important.

Predictive Analytics processes this data using different statistical methods such as extrapolation, regression, neural networks, or machine learning to detect in the data patterns and derive algorithms. These algorithms are reviewed based on test data and optimized. Also note that the more data are available, the more accurate are the developed algorithms. If the optimization process is finished, the algorithm and the model can be applied to data whose classification is unknown.

Data Mining vs. Predictive Analytics – Are They the Same?

”Often data mining and predictive analytics used interchangeably. In fact, methods and tools of data mining play an essential role in predictive analytics solutions; but predictive analytics goes beyond data mining. For example, predictive analytics also uses text mining, on algorithms-based analysis method for unstructured contents such as articles, blogs, tweets, Facebook contents.”

Views: 8296

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Scott Burk on September 8, 2017 at 12:18pm

Jason, I must agree with Mathieu and several others that you need to cite some references and be more rigorous here.  I think closed-form solutions, formulas that provide a solution without convergence is what you are thinking of  at least in some measure.  Might want to refine the post. 

Comment by Mathieu Landry on November 5, 2016 at 8:25am

@Sione. Perhaps the research literature is a bit off then? Concrete example?

See my previous comment on fundamental semantics...

Comment by ajit jaokar on October 22, 2016 at 12:21am

@Jason good blog. In fact, this was a point of discussion with one of my clients. So useful insights thanks!

Comment by Sione Palu on October 17, 2016 at 5:36pm

Jason Li,

The research literature makes no differentiation between the two. Just look up some of the topics you've touched in your article, they're all over the place in machine learning journal, statistics, data-mining & what have you. What does that tell you? The link you pointed to is NOT research journals.

Comment by Jason Li on October 17, 2016 at 3:47pm

Thanks for the comments, Sione and Mathieu!

Data mining and predictive analytics are not the same from my view. The last paragraph of my post is from the following article. You can find more discussion in the original article.

http://ecmapping.com/2016/05/10/data-mining-vs-predictive-analytics...

In addition, the task for data mining is to investigate correlation between measurable variables; it looks for patterns and relationships. But the task for predictive analytics goes beyond that and delivers answers that can guide actions. Even though they are "2 sides of the same coin", the coin's two sides are not exactly the same. 

Comment by Mathieu Landry on October 17, 2016 at 11:54am

Data-mining = the past

Predictive analytics = the future

The past cannot explain and does not predict the future with guarantee. Finding some hidden fundamental rules by data mining can help in the next step, which is prediction.

So they are intimately linked but no, they are not the same in my view. They are parts of a larger process.

Comment by Sione Palu on October 13, 2016 at 9:46am

Don't try to re-define these. Data-mining and Predictive analytics are the same thing. Different word labelling but both doing the same task. Dont get bogged down in word semantics. It is similar to the argument between the difference between Statistics & Machine-Learning. They are 2 sides of the same coin because they're hugely overlap.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service