Declarative Machine Learning Alone isn’t Enough for the Data Science Community

Today, the machine learning (ML) market is the largest segment of the artificial intelligence (AI) market and is expected to grow from $22.6 billion to a staggering $126 billion in the next three years. Use cases for ML are seemingly infinite, from automatic responses to queries and automated stock trading, to recommendation engines and customer experience enhancements. In the last few years, we’ve witnessed ML move from an academic pursuit to an essential technology applied in every aspect of computing. The fact that ML is one of the top three demanded tech skills in 2022, alongside DevOps and AI, is a testament to its emergence as a cornerstone technology for businesses of all shapes, sizes, and industries.

By leveraging ML, businesses can accelerate repetitive tasks and reassign human resources to higher-value activities. Low-risk decision-making can be outsourced to technology, and forecasts can be generated that empower human users to make better high-risk decisions. The rise of in-database machine learning accelerates these capabilities further, allowing businesses to close the gap between data reporting and data insight, and enabling decision-makers to act quickly with 360-degree awareness.

ML is so useful and so pervasive, that great strides have been taken to democratize the technology and make it available to businesses that might otherwise not have the expertise or capital to take advantage. In this article, we’re going to explore the various attempts to democratize ML, from AutoML to Declarative ML, including how they work and why they might be good – but not quite good enough.

The democratization of machine learning

Until recently, ML was the preserve of companies with established in-house expertise and endless budgets to match. These organizations could afford to write carefully customized ML algorithms line-by-line for specific purposes. According to a recent survey, only a quarter of businesses said that creating an ML model took them less than a month, whereas nearly 4 in 10 said it would take them up to a whole year. This is creating a bottleneck, where countless businesses see the value in ML but few are able to truly capitalize on it.

However, the next chapter of ML looks very different. Instead of requiring users to fully understand the detail behind how models are trained and utilized for obtaining predictions or having a team of coding experts on-hand to tailor and tweak algorithms to suit a particular use case, smaller teams and even those without coding skills will be able to deploy and utilize ML to their advantage.

The rise of AutoML

AutoML was one of the first steps on this journey. AutoML enables developers with limited machine learning expertise to train high-quality models specific to their business needs. AutoML is well suited to the goal of democratizing ML, but the technology is not without its flaws.

For instance, AutoML lacks flexibility and precision because it’s set up to accommodate a wide variety of datasets. This puts AutoML in the “good enough” category for most cases but leaves a lot to be desired if businesses are trying to design a solution to address a specific problem or achieve a specific goal. AutoML also falls into the increasingly common trap of “unexplainable AI”, in which trust in the decision-making ability of a model is brought into question because users can’t see the “working out” or understand how the algorithm is making certain decisions.

Declarative ML

Declarative machine learning furthers the goals of AutoML, namely hiding the complexity of machine learning and reducing the amount of manual input. Using Declarative ML, users need only “declare” their overall data schema, such as names and types of inputs, instead of writing low-level code. A user of a declarative ML system doesn’t need to know how to implement an ML model or pipeline, just as someone who writes an SQL query doesn’t need to know about database indexing and query planning. However, the declarative interfaces used have varied over time and with different applications, making declarative ML a challenge in and of itself. What’s more, gathering insight from declarative models and scaling them as the use case grows or expands is typically quite difficult. It’s very much a data-first approach, leaving businesses themselves to close the gap between data and actionable intelligence.

Declarative ML and AutoML are useful in terms of data gathering, with the former offering more flexibility and customization, but neither are enough to satisfy an organization’s need for near real-time intelligence.

The best of both worlds

The many iterations of AutoML and Declarative ML are strong options for making machine learning more accessible to businesses, but it’s becoming increasingly apparent that a more evolved, holistic approach is needed if data scientists are to take full advantage of ML.

Organizations need a way to turn data into insight at the database level, reducing the number of steps needed to generate actionable information that can then be passed onto users or fed into a ML model. If pre-processing of data can start occurring within databases themselves, ML models can be deployed and utilized instantly, broadening the scope of what businesses can achieve with machine learning. Similarly, businesses should seek to use a combination of AutoML and Declarative ML to access the full breadth of what ML can offer.

AutoML and Declarative ML are useful and are here to stay, but a multi-faceted, combined approach is needed if businesses are to scale and develop ML in a way that closes the emerging gap between machine learning models and the needs of data science teams.