My grandmother was born in a country called Yugoslavia. As you may know, Yugoslavia was one country with seven borders, six republics, five nationalities, four languages, three religions, and two alphabets. So, I used relatively sophisticated words like "Balkanization" at a fairly young age; and I still use it gladly to qualify the technological world of data infrastructure today.
Historically, the term Balkanization was not used to refer to Yugoslavia itself, but rather in reference to the Ottoman Empire and Austro-Hungarian Empire in the 19th century. These two empires, in fact, at the end of the Napoleonic wars, covered a large area of Europe, including countries such as today's Romania, Bulgaria, Greece, Croatia etc. As the 19th century progressed, successive fragmentations of empires led to the creation of autonomous states, which were sometimes united against their common enemies (the two great empires), sometimes hostile towards each other. The evolution of these republics occurred under pressure from the major powers, particularly Russia, always looking for strategic access to the Mediterranean.
If one dared to make a bold comparison, the world of Business Intelligence of the 90s, dominated by a few solutions (Cognos from IBM, Business Object from SAP, Oracle, SAS) fragmented starting in the early 2000s under the pressure of new solutions, for the most part open source. These Open Source solutions were developed in large part by stakeholders of the Internet revolution (Yahoo, Facebook, Twitter, LinkedIn) who chose the open source way for the construction of infrastructure under the pressure of a dominant Google.
This is where our comparison stops, because there is no war or conflict between the various emerging open source platforms. However, there is still a great heterogeneity.
Imagine for a moment: you are starting with a new business, and you are responsible for mounting a data platform from scratch. Besides interesting questions of recruitment and business goals, you have several technology platform questions. With tradeoffs particular to each area:
We thus come to eight technical areas. For each of them, there are a number of attractive technologies for taking advantage of them
For the purpose of this exercise, we will limit our list to the most well-known solutions, accessible free of charge (at least partly)
The Technoslavia math is easy to do. By multiplying the possibilities, we have not 6 x 8 = 48, but rather 6 to the 8th power = 1,679,616 possible architectures!
If 1,000 companies tested each combination in parallel, one by one, at a rate of 1 evaluation per day, it would take about five years to test them all.
Many of the technologies mentioned in the table above are less than 5 years old, and in 5 years, a number of them will be outdated. In short, the Big Data architect of today should reread the myth of Sisyphus, and be satisfied with continual trials and self-reinvention.
(Note that "We must imagine Sisyphus happy" is a notion that Albert Camus borrows from Baron Kuki Shuzo, Japanese philosopher, student of Heidelberg and contemporary of Bergson)
However, there are not so many possibilities as that. A few typical architectures are possible:
Alongside these solutions, cloud platform providers (AWS, Google Cloud, Azure) are also launching their own platforms, seeking to provide integrated platforms from start to finish. For this, they provide products for each area:
A time of similar heterogeneity marked the early 80s for data storage technologies. From this, emerged the paradigm of the relational database and the SQL language that unified it all.
What is the future in the world of data: unified by a single language and the victory of one player over the others? Will the multiple republics of data will soon give birth to a new Empire?
Several points leave room for doubt. The current Big Data ecosystem is very different indeed from the "Data" ecosystem of the 80s:
This is why we at Dataiku believe that Technoslavia is built to last! And, indeed, that's why we have created Data Science Studio.
In their furious crossing of Technoslavia, Data teams need a reassuring guide, a tool, always on the frontier with the latest technologies, but which allows all profiles to quickly seize it.
It's an endless battle, but it's one we will take up!
About the Author:
Florian Douetteau is CEO at Dataiku. Originally from ex-Yougoslavia, he now lives in New York. Florian graduated from Ecole Normale Supérieure, one of the most prestigious schools in France.