More on Fully Automated Machine Learning

Summary: Recently we’ve been profiling Automated Machine Learning (AML) platforms, both of the professional variety, and particularly those proprietary one-click-to-model variety that are being pitched to untrained analysts and line-of-business managers. Since our first article, readers have suggested some additional companies we should look at which are profiled here along with some interesting observations about who is buying and why.

Recently we’ve written a series of articles on Automated Machine Learning (AML) which are platforms or packages designed to take over the most repetitive elements of preparing predictive models. Typically these cover cleaning, preprocessing, some feature engineering, feature selection, and then model creation using one or several algorithms including hyperparameter optimization. Most will then offer code export and an API for scoring.

Two Major Schools

These are grouped into two major schools. Tools for professional data scientists are open source packages in Python or R that are integrated with common libraries like scikit-learn. Since it requires knowledge of these common data science scripting languages it’s unlikely that non-data scientists will try to use them, hence our “professional” title.

The second school however should capture our attention for different reasons. These are commercial platforms with well-designed UIs requiring no code, but offering the same or greater levels of automation. We call these One-Click Data-In Model-Out and while some are designed to appeal to professional users, others are clearly targeting citizen data scientists and analysts with little or no formal data science training.

Particularly these One-Clicks offer some professional advantages.

For shops producing a high volume of customer behavior models or regression value forecasts these can be real time savers. The best of this bunch run all the modern algorithms and even create ensembles. The one’s I’ve tested produced very good results and you can’t beat them for speed. Even SAS and SPSS have incorporated some of these features into their various products. So whether you’re producing a lot of models or whether you just don’t have enough trained data scientists to keep up these are worth a look.
Their second marketing target however is a bit more controversial. These platforms are also (and sometimes exclusively) pitching to the untrained or at least lesser trained analyst and line-of-business user to bypass the use of data scientists. Yes the shortage of data scientists may create bottlenecks. Compounded by the fact that Gartner says this citizen data scientist market is 5X the size of the professional market and it’s easy to see why these platform developers would slant this way.

Some Market Observations

In talking with these developers, those pitching most directly to the non-data scientist market are indeed getting pushback from internal data science teams. They also report getting a warm reception from line-of-business managers who are suffering from the bottlenecks.

An interesting theme that emerges is that sales are most likely where no formal in-house data science group exists or if the company is currently outsourcing its model building. According to one recent study, this group still accounts for about 60% of all businesses though we all understand that penetration in the largest companies is already 100%.

Another observation is that the greatest pushback comes from companies who have teams of very young data scientists. The explanation provided is that these recent grads still think that all the operational requirements of predictive analytics can be performed directly in R or Python on which they were taught. Teams of more mature data scientists have figured out the limitations of using scripting languages and are more inclined to accept proprietary platforms as an additional tool.

Who Is In the Market?

In our first article we reviewed:

DataRobot (www.DataRobot.com)
MLJAR (www.mljar.com)
PurePredictive (www.PurePredictive.com)
Xpanse Analytics (www.xpanseanalytics.com)

Thanks to our readers we’ve identified three additional competitors that we describe here. (In alphabetical order:)

Compellon (www.compellon.com)

I had the pleasure of a demo and long conversation with Nikolai Liashenko, Chief Data Scientist and Marc Bir, Chief Technology Officer. Compellon is firmly placed in the one-click non-data scientist market and has a very nicely developed UI with an emphasis on interpretability and transparency suitable for regulated markets.

This includes displaying the variable impacts for each model down to the individual customer level showing that the decision for one customer may have been influenced by different variables than for another customer. It also facilitates ‘what if’ analysis that would be most relevant to LOB users.

What’s distinctly different about Compellon is the underlying data science. They have developed a proprietary AI model generator that does not rely on any of the known statistical modeling algorithms and is therefore difficult to describe. Based on Nikolai Lyashenko’s own lifetime research in information theory the engine quickly produces good quality models but without reference to classical feature selection or hyperparameter tuning. Lyashenko says that the generator frequently produces models with the characteristics of deep neural nets but that DNNs are not used in model creation.

Compellon also uses a unique definition of feature engineering relating to identifying and ‘combining’ variables with extremely strong predictive capability. Classical data cleaning and feature engineering are not required but could be accomplished outside the system before submitting the data.

Another interesting feature not found in other one-clicks is the optional ability to run a segmentation as part of the model that can conceivably deliver back as many as 10 segment-specific models.

They plan to publish comparative benchmark accuracy data later this year and claim that their current in house tests have shown very high levels of fitness.

Blending: no, starts with analytic flat file.
Cleanse: no, missing data, outliers, miscodes need to be handled before loading. However Compellon states that their proprietary engine requires very little preprocessing or cleaning.
Impute and Transform: Not in the classical sense. See description above.
Feature Engineering: Again not in the classical sense though Compellon describes their system’s identification of ‘super predictors’ as a form of data engineering.
Feature Selection: yes.
Select ML Algorithms to be utilized: Proprietary AI-based model generator.
Create Ensembles: no.
Run Algorithms in Parallel: no.
Adjust Algorithm Hyperparameters during model development: no – not relevant to their proprietary engine.
Select and deploy: Only a single champion model is presented. Deploy via Java, or API.

DMWay (www.DMWay.com)

I had the pleasure of a demo and conversation with DMWay CEO Gil Nizri and CTO Ronen Meiri. They have elected to pursue an ultra-simple but sophisticated approach targeting non-data scientists. DMWay offers only GLM as a modeling tool and has developed a nice suite of preprocessing and feature selection tools to round out their easy-to-use platform. Their focus at least initially is on regulated markets (banking, insurance and lending) where interpretability is key and the volume of models is high, but looks to expand into all types of users.

Blending: no, starts with analytic flat file.
Cleanse: yes
Impute and Transform: yes
Feature Engineering: Some. More sophisticated automatic creation of for example ratios from related variables is to follow.
Feature Selection: yes
Select ML Algorithms to be utilized: Only GLM, selection of linear or logistic.
Create Ensembles: no.
Run Algorithms in Parallel: no, only one algorithm to run.
Adjust Algorithm Hyperparameters during model development: yes – some access to adjust these for knowledgeable users.
Select and deploy: Only a single champion model is presented. Deploy via R, Java, or SQL.

TIMi Suite from Business Insights (www.timi.eu or www.business-insight.com)

From my interview with Frank Vanden Berghen the Director and founder of Business Insights and the TIMi platform, they are pursuing both the professional data science market and the non-professional one-click market. For the latter TIMi (The Intelligent Mining Machine) comes with the expected much simpler UI allowing fully automated operation.

Based in Belgium with representation in the US, TIMi is the only suite we encountered that laid claim to many years of significant wins and high placing in various competitions, including most recently a 9^th place in a 2015 Kaggle contest. TIMi also offers complete SAS integration. Some clients are reported to be using TIMi up through feature engineering then exporting the dataset.

Blending: yes with their separate Anatella ETL product including direct access to HDFS.
Cleanse: yes
Impute and Transform: yes
Feature Engineering: yes including at least ratios, binning, and data range conversions.
Feature Selection: yes
Select ML Algorithms to be utilized: no, automatically runs logistic regression, decision trees, and elastic net.
Create Ensembles: no.
Run Algorithms in Parallel: yes.
Adjust Algorithm Hyperparameters during model development: yes – access to adjust these in the professional version interface.
Select and deploy: Only a single champion model is presented. Deploy via Java, SQL, or PMML.

Having That Conversation with Management

Whether you are pro or con automated machine learning, there will come a time when you have to explain to management the risks involved in allowing non-data scientists to produce production predictive models. Management will also want to know how those risks can be mitigated and whether this should be allowed at all. Chances are very high that making this explanation will fall to you, the in-house data scientist.

However you chose to handle this, with resistance, or by ‘democratizing’ the process is up to you. It’s clear however that the AML market is just gaining traction and that you’ll be seeing more and more of them.

To most of us this seems like a natural progression to automate what can be safely automated and preserve our time for the creative portions of data science. We suggest you be proactive and check out some of these, then draw your own conclusions.

Additional articles on Automated Machine Learning

Next Generation Automated Machine Learning (AML) (April 2018)

Automated Machine Learning for Professionals (July 2017)

Data Scientists Automated and Unemployed by 2025 – Update! (July 2017)

Data Scientists Automated and Unemployed by 2025! (April 2016)

About the author: Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001. He can be reached at:

[email protected]