Ontology Hack – Make Use of Existing Enterprise Data Assets Instead of Starting from Scratch
As an author of a (reasonably) popular book, I often get asked questions about semantics, ontology, and knowledge graph by people who have read the book or perhaps have heard me speak at a conference. I quite welcome these questions since they give me an insight into how people from many industries and practices view semantics and knowledge graph.
I got such a question from a young consultant who was about to begin a project with a major bank, building a knowledge graph. They wrote to me because they felt that an ontology would be key to success in the project and asked for some advice for getting started.
Right away, I saw some red flags. Indeed, an ontology plays a key role in a knowledge-based system. But, as my colleague Jim Hendler is often quoted, a little ontology goes a long way. And there is a tendency, especially among young semantic consultants, to focus on their first ontology to the detriment of other aspects of the project.
Let’s face it; for many people, ontology modeling is fun. You get to talk to smart people who are passionate about their field, and you get to write down their knowledge, organize it, and give it a life of its own. For a curious life-long learner, this is the kind of thing you do for kicks.
The Role of An Ontology in an Enterprise Knowledge Graph
If we step back a moment and think about the role that an ontology has in an enterprise knowledge graph, it is to describe the common concepts that are shared among the myriad data resources in a large enterprise. For example: do many of our data resources refer to something that we call a “Patient” or maybe a “Company”? Then the ontology is the place to describe that concept. Does this enterprise need to understand fine distinctions between different kinds of patients? Or different forms of legal entities? Then the ontology is the place to record these differences. The ontology plays the role of a common reference point so that one part of the enterprise can know what another is talking about.
When we think of it this way, building an ontology by talking to people about the domain isn’t addressing the issues that a semantic layer needs to address. We must know what kinds of things the various data resources — and applications! — in the enterprise need to talk about, and what they need to say about them.
For this reason, many books on ontology engineering advocate a use-case-based approach to ontology engineering; connect the ontology to the business by identifying use cases for various business roles. This is good advice, as it’s a way to make sure that the ontology is responsive to the needs of the business. And that, after all, is why we are building the enterprise knowledge graph in the first place.
But I have a secret to share with you; in my practice, I have rarely followed a formal use-case approach, identifying roles and telling their user stories. Why not? Because ontology engineering didn’t invent use cases and user stories; a lot of other folks use them. Which folks? Enterprise application developers, data architects, and, probably most relevant to this story, data warehousers.
There are a lot of advantages to building a semantic layer and an enterprise knowledge graph over a traditional data warehouse — which we won’t go into here, but if you want just two words, let’s say agility and transparency — but the basic problem they’re both addressing is largely the same; we want something that will unify our understanding of a variety of data sources, presenting them all through a common viewpoint.
What does this mean for ontology engineering? It means that there’s a good chance, a really good chance, that someone has already identified use cases, user roles, and user stories, and expressed those in the form of a data warehouse or data lake. And furthermore, we can tell if they did a good job of it by how successful the warehouse has been.
The Shortcut to Your Ontology
It is quite common that the need for an Enterprise Knowledge Graph is perceived by someone who has been using one or more warehouses for a while; they did a good job in their day, but they lack the agility to keep up with the changing needs of the enterprise. There is a tendency to think that we just need to throw them out and start fresh. But there’s gold in those hills; those warehouses have captured something that the enterprise has found to be a useful commonality among a number of primitive data sources.
So here’s the shortcut, which I have used on many occasions; find one of these ‘legacy’ warehouses, and study its schema. Depending on the foundational technology of the warehouse, that schema may be easy to query or maybe not. But regardless of how much effort you need to get that schema, it is bound to be worth it. Yes, your first ontology can simply be an expression of the schema that you take from a successful — if a bit outdated — data warehouse.
This almost seems like cheating, but it satisfies many needs of a semantic layer. Right out of the box, it links to multiple data resources — all the sources that contribute to the warehouse. And don’t forget, the warehouse itself is a data asset, and you can link to that with no effort at all! This allows you to show some value right away; You can do whatever that warehouse did, but now the schema is a data asset in its own right, a data asset that you can query (transparency) and further develop (agility).
Your next move is to find the low-hanging fruit; what are the questions that the warehouse couldn’t do? Was it difficult to integrate a new dataset? Or maybe there is a cross-reference between different business units that is still being done by hand? Use the agility of the ontology to add in new distinctions, new connections, or new data sources.
Once you’ve done this, will you have achieved all the benefits of a knowledge graph? Will you have your data as a product, connected together into an enterprise data mesh? Certainly not; this is just a shortcut to getting started. But what you will have is a nice little knowledge graph that shows the value of this approach right away. It will have terms that the enterprise is already familiar with, and it will allow you to extend or even change those terms. It has a built-in connection to existing business practices. And if anyone in the business complains that the old warehouse is doing something wrong, well, now your business users are giving you unsolicited advice on what you need to develop next. This is the very definition of agility.
Yes, this is a “knowledge graph hack”, and eventually, you’re going to want to connect to some use cases, user stories, and other ways of organizing the development of your knowledge-based system; but you’ll be pretty far down the road, with the backing of a good portion of the business. And that always makes the going a lot easier.
Dean Allemang is the Principal Solutions Architect at data.world and author of Semantic Web for the Working Ontologist: Effective Modeling for Linked Data, RDFS, and OWL