Reference to Hadoop implies huge amount of data. The intend of the data is of course to derive insights that will help businesses stay competitive. "Scoring" the data is a common exercise in determining e.g. customer churn, fraud detection, risk mitigation, etc... It is one of the slowest analytics activities and especially when very large data set is involved. There are various fast scoring products in the market but they are very specialized and/or are provided by one vendor, usually requiring the entire scoring process to be done using its tools set. This poses a problem for those who build their scoring model using tools other than that of the scoring engine vendor.
There is a democratic way of doing scoring. It relies on the use of the industry standard called PMML (Predictive Model Markup Language). Any tool used in building model, including scoring, that is able to export its model in PMML, will be able to score data in Hadoop/Hive in a flash using a universal PMML plug-in (UPPI) resident inside of Hadoop/Hive.
Hive makes it possible for large datasets stored in Hadoop compatible systems to be easily analyzed. Since it provides a mechanism to project structure onto the data, Hive allows for queries to be made using a SQL-like language called HiveQL.
Once deployed in UPPI, predictive models expressed in PMML turn into SQL Functions. These can then be invoked directly in HiveQL. In this way, UPPI offers Hadoop users the best combination of open standards, performance and scalability for the application of predictive analytics.
UPPI for Hadoop/Hive delivers instant and scalable scoring for Big Data while retaining compatibility with most major data mining tools through the PMML Standard. It also brings the scalability of Hadoop to the execution of predictive analytics.