Summary: Recently we’ve been profiling Automated Machine Learning (AML) platforms, both of the professional variety, and particularly those proprietary one-click-to-model variety that are being pitched to untrained analysts and line-of-business managers. Since our first article, readers have suggested some additional companies we should look at which are profiled here along with some interesting observations about who is buying and why.
Recently we’ve written a series of articles on Automated Machine Learning (AML) which are platforms or packages designed to take over the most repetitive elements of preparing predictive models. Typically these cover cleaning, preprocessing, some feature engineering, feature selection, and then model creation using one or several algorithms including hyperparameter optimization. Most will then offer code export and an API for scoring.
Two Major Schools
These are grouped into two major schools. Tools for professional data scientists are open source packages in Python or R that are integrated with common libraries like scikit-learn. Since it requires knowledge of these common data science scripting languages it’s unlikely that non-data scientists will try to use them, hence our “professional” title.
The second school however should capture our attention for different reasons. These are commercial platforms with well-designed UIs requiring no code, but offering the same or greater levels of automation. We call these One-Click Data-In Model-Out and while some are designed to appeal to professional users, others are clearly targeting citizen data scientists and analysts with little or no formal data science training.
Particularly these One-Clicks offer some professional advantages.
Some Market Observations
In talking with these developers, those pitching most directly to the non-data scientist market are indeed getting pushback from internal data science teams. They also report getting a warm reception from line-of-business managers who are suffering from the bottlenecks.
An interesting theme that emerges is that sales are most likely where no formal in-house data science group exists or if the company is currently outsourcing its model building. According to one recent study, this group still accounts for about 60% of all businesses though we all understand that penetration in the largest companies is already 100%.
Another observation is that the greatest pushback comes from companies who have teams of very young data scientists. The explanation provided is that these recent grads still think that all the operational requirements of predictive analytics can be performed directly in R or Python on which they were taught. Teams of more mature data scientists have figured out the limitations of using scripting languages and are more inclined to accept proprietary platforms as an additional tool.
Who Is In the Market?
In our first article we reviewed:
Thanks to our readers we’ve identified three additional competitors that we describe here. (In alphabetical order:)
I had the pleasure of a demo and long conversation with Nikolai Liashenko, Chief Data Scientist and Marc Bir, Chief Technology Officer. Compellon is firmly placed in the one-click non-data scientist market and has a very nicely developed UI with an emphasis on interpretability and transparency suitable for regulated markets.
This includes displaying the variable impacts for each model down to the individual customer level showing that the decision for one customer may have been influenced by different variables than for another customer. It also facilitates ‘what if’ analysis that would be most relevant to LOB users.
What’s distinctly different about Compellon is the underlying data science. They have developed a proprietary AI model generator that does not rely on any of the known statistical modeling algorithms and is therefore difficult to describe. Based on Nikolai Lyashenko’s own lifetime research in information theory the engine quickly produces good quality models but without reference to classical feature selection or hyperparameter tuning. Lyashenko says that the generator frequently produces models with the characteristics of deep neural nets but that DNNs are not used in model creation.
Compellon also uses a unique definition of feature engineering relating to identifying and ‘combining’ variables with extremely strong predictive capability. Classical data cleaning and feature engineering are not required but could be accomplished outside the system before submitting the data.
Another interesting feature not found in other one-clicks is the optional ability to run a segmentation as part of the model that can conceivably deliver back as many as 10 segment-specific models.
They plan to publish comparative benchmark accuracy data later this year and claim that their current in house tests have shown very high levels of fitness.
I had the pleasure of a demo and conversation with DMWay CEO Gil Nizri and CTO Ronen Meiri. They have elected to pursue an ultra-simple but sophisticated approach targeting non-data scientists. DMWay offers only GLM as a modeling tool and has developed a nice suite of preprocessing and feature selection tools to round out their easy-to-use platform. Their focus at least initially is on regulated markets (banking, insurance and lending) where interpretability is key and the volume of models is high, but looks to expand into all types of users.
From my interview with Frank Vanden Berghen the Director and founder of Business Insights and the TIMi platform, they are pursuing both the professional data science market and the non-professional one-click market. For the latter TIMi (The Intelligent Mining Machine) comes with the expected much simpler UI allowing fully automated operation.
Based in Belgium with representation in the US, TIMi is the only suite we encountered that laid claim to many years of significant wins and high placing in various competitions, including most recently a 9th place in a 2015 Kaggle contest. TIMi also offers complete SAS integration. Some clients are reported to be using TIMi up through feature engineering then exporting the dataset.
Having That Conversation with Management
Whether you are pro or con automated machine learning, there will come a time when you have to explain to management the risks involved in allowing non-data scientists to produce production predictive models. Management will also want to know how those risks can be mitigated and whether this should be allowed at all. Chances are very high that making this explanation will fall to you, the in-house data scientist.
However you chose to handle this, with resistance, or by ‘democratizing’ the process is up to you. It’s clear however that the AML market is just gaining traction and that you’ll be seeing more and more of them.
To most of us this seems like a natural progression to automate what can be safely automated and preserve our time for the creative portions of data science. We suggest you be proactive and check out some of these, then draw your own conclusions.
About the author: Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001. He can be reached at: