Subscribe to DSC Newsletter

The Changing Nature of Predictive Analytics in the Enterprise

Today, an increasing number of institutional clients are looking for solutions, strategies and roadmaps to implement Big Data and Predictive Analytics initiatives within their own organizations. While the exact nature of the solutions and recommendations may differ from client to client, based on a number of factors, like the industry they operate in, the size of their operations and business model, there are common threads that can be applied to their needs.

While looking for these common threads, I came across an interesting white paper titled "Standards in Predictive Analytics"  by James Taylor (CEO, Decision Management Solutions) in which he shares his thoughts on the subject. This blog post summarizes some of the key points that the author makes, along with some of my own thoughts from my engagements with both mid and large sized clients, within and outside the US, which I hope you find useful.

The changing nature of Predictive Analytics today.

In the past, a predictive analytic model was generated using a single proprietary tool like ,for example SAS,  against a sample of structured data. The model would then be applied in batch to generate scores for future use in a database or data warehouse.

Today, there is a focus on "operationalizing analytics", that is, building models and applying these models in their day to day operations, turning the organization's data into useful, actionable insight (e.g. real time scoring) which can be used NOW to improve customer engagement, manage risk, reduce fraud, etc.

The biggest mistake that organizations make while trying to build an Analytics and/or Big Data Strategy is  focusing on the technology before understanding the business problems/opportunities and decisions that need to be made. In other words, it is NOT enough to just ask for greater insight. You need to take the effort to better understand the insight and the underlying business problem that the insight will hopefully help solve. Once identified and well understood, the desired insight will naturally drive the analytics and big data requirements.

More Big Data drives more Predictive Analytics

The growth of Predictive Analytics has increasingly merged with the growth of Big Data. Increased digitization and the internet has exponentially increased the amount of big data available as well as the range of data types and the speed at which data arrives. This is commonly described as the "3 Vs": Volume, Variety and Velocity of Big Data.

Organizations are finding that the data they need for predictive analytics is no longer all structured data and no longer data stored in  their databases and data warehouses. There is increasing evidence to support the notion that predictive analytic models built leveraging both structured and unstructured Big Data are making a more transformative impact on the business when compared to analytic models built using structured data alone.

3 Core Emerging Themes in Predictive Analytics

The role of R in broadening the predictive analytic ecosystem

R is free and open source making it appealing as a tool to learn advanced analytics with. Because R is open and designed to be extensible, the number of algorithms available for it is huge with over 5300 packages today. Ironically. this proliferation of packages has led  some to question Are there too many R packages today? (but I digress...).

While scalability and performance has traditionally been an issue with R, commercial Vendors like Revolution Analytics are providing their own R implementations for Big Datasets that overcome these limitations.

The role of Hadoop in handling Big Data for predictive analytics

Hadoop consists of two core elements- the Hadoop Distributed file System or HDFS and the MapReduce programming framework.

While some newer organizations, like web 2.0 companies are putting all their data in Hadoop, a mixed database/data warehouse/Hadoop approach is more common. In the mixed environment, Hadoop is used as a landing zone for data where it can be pre-processed before being moved to a data warehouse. This allows for rapid addition of new data sources to an existing environment. Hadoop is also used as an active archive, where older data that might have been archived inaccessibly in the past to be available for analysis, can now be used to build predictive models.

The role of PMML in moving to real-time predictive analytics

 If predictive analytic models cannot be effectively operationalized and injected into operational systems, there is the risk that they will sit on the shelf and lose value. Most analytic environments are batch oriented and are often loosely attached to the production environment. There is a need  to move models built in a variety of analytic tools, into their production environments including workflow engines, business rules management systems, etc. PMML (Predictive Model Markup Language) has emerged as an important way to achieve this.

Views: 2253


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Adrian Walker on October 13, 2015 at 2:41pm

Executable English could be an analytics game changer, bridging the gap between business folks and IT.

Here's a summary slide:

and an example:

Comment by Sione Palu on January 17, 2015 at 5:45pm

We use Matlab and Java as core of our analytics and sometimes we use R which is called from our Java app (only a tiny R functionality) if those functionalities are not already available in either Java or Matlab. We prefer Matlab is because it is widely used in academics for various fields of analytics (engineering, physics, math/stats, biology, economics, etc...). If we implement some algorithms we found from a published paper that we came across, sometimes we sent our Java or Matlab codes to the original author/s to comment on the correctness of our implementation of their algorithm described in their paper. If we sent Java, most replied back that they have not worked with Java but they only know Matlab. Sometimes we have to port from Java to Matlab in order to send the codes to the original author/s of the algorithm to test/check the correctness of our implementations. A published algorithm in a paper can be well presented for anyone to follow, but it doesn't mean that the implementer all the time reads the minds of the original author/s. It is necessary to fire a question to the original author/s regarding the correctness of the implementation of their algorithms because one can easily misinterpret the author's derivations of their algorithm & the pseudo-codes they presented in their paper.


  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service