We live in a data-rich world. Very data rich. Indeed, it’s estimated that roughly 2.5 quintillion bytes of data are created every day.
Perhaps because of its ubiquity, there are those who believe the sheer volume of available data means we have all we need to easily and accurately answer any question without delay. If you can’t, they declare, you just need more data.
But if you already have a massive amount of data and you still can’t answer a question… is more data really what you need? At this point, it’s not for lack of data that you haven’t been able to solve your problem. So why would you believe that with more data, your problem is going to be solved?
To use a platitude often attributed to Einstein, “The definition of insanity is doing the same thing over and over again and expecting a different result.” And that’s what a data-first mindset is doing: driving us insane.
Here, I’ll explain why we need to move away from this data-first world, and why we need a paradigm shift away from the myopic focus on data and answering every question with, “We need more!”
To maximize the value of all the data available to us, we need to move to a knowledge-first world, a world where we think about context, people, and relationships first.
Overfilling Your Data Lake with a Data-First Approach
If you’re stockpiling ever more data in an attempt to solve for stubborn use cases, you need to store that data somewhere. And that somewhere is usually a data lake. The more full that lake becomes with endless amounts of data, the murkier your organization’s understanding of what’s in there, and of what it all means; when that happens, your data lake has become a data swamp.
A symptom of this pervasive problem is the widely seen shift from the Extract, Transform, Load (ETL) process for replicating data from source systems to target systems, to ELT: Extract, Load, Transform. Yes, the move to ELT saves time by allowing organizations to load data to destination systems without modeling it beforehand, but that often means data remains incompatible with the target systems the moment it’s needed. And that leads to business users with little data literacy scrutinizing raw data and saying, “What the hell is this? There’s so much data here… but I don’t know what I’m looking at.”
And this is the problem with our data-first world; the disconnect between the data itself and the valuable knowledge that data can provide.
Knowledge-First Can Save Us
In a knowledge-first world, you approach your data with a people-first, relationship-first, and context-first perspective. Instead of firing over mass amounts of confusing, raw data, consider:
- Who needs to consume the data? (People)
- Why do they need to consume it? What use case are they trying to solve for? (Context)
- How is this data related to other data and people? (Relationships)
Then, when you’ve answered those questions, you need to ensure your transformed data can be understood… and understood by business users who may not have the technical chops of your data team. These are the first questions to consider in order to start treating data as a product.
This is where modeling and semantics — and knowledge — take center stage. And this is where it’s crucial for data experts to be business literate, or have a business-literate teammate to translate.
Teams Must be Data and Business Bilingual to Succeed
A data-first world is focused on data literacy, a topic that’s been discussed ad nauseum in our industry over the past 20 years. We’ve hammered on the importance of teaching business users how to analyze datasets to get maximum value from the organization’s data, from the executive level on down. But the onus has been on the business users for too long, and a massive amount of value has been lost because of it. To really tap into the value of your data, it has to be a two way street.
A knowledge first world is focused on business literacy. And in order for data teams to return maximum value for their organizations, we’re gonna have to go to school.
Right now, the disconnect between data teams and business teams means that the first might not understand the business, while the second might not understand the data. Data literacy has become a near-crucial skill for business leaders; going forward, business literacy will become equally as important for data leaders. How does our sales pipeline work? What do we consider a marketing qualified lead? What in the heck is a BDR? To answer questions like these, your team will need to talk to the people in your organization for whom they’re a daily priority.
And once these questions, and many more, can be answered by your business-literate data team, you’ll start gaining the context you need to deliver the data your business users need to drive business success.
More importantly, you’ll be able to deliver not just data, but knowledge, in a knowledge-first world.
Juan Sequeda is the Principal Scientist at data.world, and co-host of the weekly honest, no-BS data podcast, Catalog and Cocktails.