Summary: NewSQL is alive and well and under the right circumstances could be your best choice.
No this is not a misprint. Yes we mean NewSQL, not NoSQL. Recently a colleague asked me about NewSQL and I had to admit that I hadn’t kept up. While it initially had a lot of press, much less so now. Was it still relevant? And if so under what circumstances should you consider it?
During the early adolescence of our new Big Data revolution around 2009, there were still a few flies in the ointment of the revolutionary NoSQL architecture and features. At the time these fell into three major categories:
NoSQL was good at OLAP but less so at OLTP.
Eventual not immediate consistency.
The absence of SQL.
A number of innovators observing these challenges as an opportunity took a sharp right turn from the NoSQL mainstream and christened their new path NewSQL. These included NuoDB, MemSQL, VoltDB, and Altibase among others.
NewSQL started from the premise that SQL and the relational model is well known and accepted. More important, the relational model itself is not a limitation on scale, rather it’s the physical implementation of the model which has limitations. So NewSQL DBs retained all the aspects of the relational model, the predefined schema and the use of SQL, and sought to solve the problem of scale following basically one of two paths, either in-memory and/or clever distributed architecture. And it worked.
What happened to the original shortcomings they were trying to resolve?
As the market changed and as the NewSQL providers moved forward some elements of the original problem have also changed.
OLAP versus OLTP
This can start to get confusing because Gartner, the king of naming conventions changed this category name in 2014 from OLTP to ODBMS (Operational Database Management Systems). Why? Because NoSQL and NewSQL offerings that had previously been “overwhelmingly supplemental to traditional relational DBMS deployments, not destructive” per Nick Heudecker, Research Director – Information Management at Gartner, are now seen as viable contenders to replace relational systems in the future.
Heudecker says “Going forward, we see the bifurcation between relational and NoSQL DBMS markets diminishing over time”. All of which means that the NoSQL and relational DB markets with NewSQL squeezed in the middle are continuing to converge, share features, and become increasingly competitive as direct replacements for one another. In short, NewSQL’s and relational’s lead in OLTP is going away. The Gartner Magic Quadrant for ODBMS currently shows 25 vendors on their chart with traditional RDBMS, NewSQL, and NoSQL vendors right alongside each other.
Eventual not immediate consistency
Briefly, the problem of consistency arises in NoSQL because the shared nothing, massive parallel processing (MPP) architecture seeks to replicate each incoming transaction typically on three or more identical nodes. In the beginning NoSQL could not guarantee that if all of the replicas were read at the same time that they would agree, perhaps only by a matter of a few milliseconds up to perhaps a few seconds of delay. For many applications that was perfectly acceptable but less so with applications like inventory balances (one node says the product is available but another says it’s out of stock) and pretty much completely unacceptable for financial transactions where multiple incoming deposits, transfers, and withdrawals against an account could theoretically show different balances depending on which node was queried and in what order the transactions had arrived.
This was an area in which NewSQL excelled since their architectural solution had retained full ACID compliance and could guarantee complete agreement among multiple nodes. The competitive problem is that pretty much all the leading NoSQL DBs have attacked this problem with a vengeance and now advertise immediate consistency. If this is a particular concern for your app, it’s still best to investigate this in-depth, but NewSQL being ACID will always be the same, and with very low latency.
The absence of SQL
It’s been widely discussed that the inability to use SQL on NoSQL DBs effectively shut out a lot of talented developers, DBAs, and analysts unless they were willing to reskill. In the last 12 months we’ve seen a real flurry of rollouts as NoSQL DBs announce their compatibility with ANSI SQL. In Hadoop this has come with projects like DRILL and SPARK so this advantage for NewSQL is fast disappearing.
A preference for a predefined schema
Of course since NewSQL is a relational database it continues to rely on a predefined schema. NoSQL will continue to tout the flexibility and benefits of no-schema, or more correctly late-schema. But if you are an IT professional with no current plans to work in unstructured data or petabyte volumes, the ability to retain your familiar schema and the routines necessary to maintain it are no doubt reassuring.
Basic NewSQL selection criteria
First of all NewSQL is relational and therefore based on structured data. If you are intent on working in unstructured or semi-structured data you will continue to look to NoSQL. Disclaimer: NewSQL DBs are mostly JSON and XML compliant and could handle some semi-structured data apps if compatible with a structured relational DB.
Also, NewSQL vendors are targeting large but ‘ordinary’ Gigabyte range transactional and occasionally data warehouse operations. Petabyte or larger DBs continue to be the domain of NoSQL.
For some narrowly defined high velocity data applications the in-memory NewSQL DBs will work but probably less so for the disc based solutions.
So what are the remaining benefits for NewSQL?
Scale out performance: Traditional databases can’t deliver capacity on demand or at least not economically. This can become a barrier to development as the focus shifts to scaling work arounds like partitioning, sharding and clustering. Another common approach is to add larger machines at more cost. NewSQL solutions running multiple replica nodes can scale elastically at lower cost and this architecture allows for the use of commodity hardware also at lower cost.
Continuous availability: 24/7 uptime is much easier to achieve in an architecture with multiple replica nodes providing failover protection. In traditional database systems this continues to be a concern and leads to the use of expensive hardware or complicated failover systems. So NewSQL wins here and at lower cost.
Geo-distribution: Here’s one you might not have thought of. If you are an international business with data centers in many countries you may clearly need ACID compliance with low latency that NewSQL can offer. But in today’s highly politicized data world where the data on German citizens must remain in Germany, and the same for many other countries, NewSQL lends itself to enabling a distributed single, logical database across multiple countries and facilitates the logic necessary to meet the legal requirements for in-country storage of information regarding their nationals.
What about the learning curve?
Since this is really just another relational database can I just throw it over the wall to my current staff? What’s the learning curve like? In conversation with one leading NewSQL provider they say there is some learning curve due to the multi-tier architecture but not so steep or as different as NoSQL.
So the answer is yes, NewSQL is alive and well and worth your consideration. It lies at the rapidly narrowing nexus of relational and NoSQL DBs. And if you decide to take it for a spin you’ll be in the good company of these market leaders who are NewSQL users: Dassault Systems, Ericsson, HP, Shopzilla, Comcast, Samsung, Zynga, Ziff Davis, and Boeing,
June 10, 2015
Bill Vorhies, President & Chief Data Scientist – Data-Magnum - © 2015, all rights reserved.
About the author: Bill Vorhies is President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist and commercial predictive modeler since 2001. Bill is also a Senior Contributing Editor for Data Science Central. He can be reached at:
The original article can be seen at: