While Knowledge Graph hype is nowhere near as loud as AI hype, there is no question that more and more organizations are turning to knowledge graphs to solve real-world problems. However, just as with any data solution, there are times when, after the initial acquisition of a knowledge graph solution, companies and IT managers, particularly, wonder what exactly it is they have acquired. All too often, this can result in knowledge graph solutions sitting largely under-utilized because no one can figure out what it’s for.
Knowledge graphs can make a big difference, but you need to understand these going in and be willing to commit to the project for the long haul. A knowledge graph is, in many ways, a garden, something that you plant and carefully tend, with the dividends coming out over years rather than necessarily all at once. As such, going in, understand that the knowledge graph represents an investment that will produce great rewards, but will take time to grow.
To that end, in planning for a knowledge graph solution, there are several key points to consider:
A Knowledge Graph is an Integration Platform
In most organizations, integration is one of the biggest headaches any IT manager will have to face. The services era has left many companies with dozens or even hundreds of different data services, each describing the core objects in use within one segment of the business, often times overlapping the same offerings from others. Services management tools are at best stopgaps. They can better coordinate between different data sources, but ultimately service managers only extend the time that a company can take to find a better solution.
A knowledge graph, on the other hand, is a true integration platform. It can take information from multiple data sources (as well as content originating in the knowledge graph itself) and map to a central information space (called an ontology). Then, once contained, it can be transformed to other formats as needed. However, unlike other enterprise ERPs, the primary distinction within a knowledge graph is that from within, everything connects to everything else, rather than simply being stored enterprise warehouse-like. This approach takes more work initially – because you do have to do some modeling a priori. Still, creating an internally consistent representation of your business pays off.
This does create some changes in thinking about how data is managed. In essence, the knowledge graph is a global index for persisting information. Your applications do not need to query multiple data sources every time a request is made – the knowledge graph engine queries it once, then periodically checks to see if such information has changed.
Knowledge Graphs Are Not Necessarily Linked Data
What’s more, you can customize the output from the knowledge graph so that it reflects the desired ontology (data model) by a client rather than the ontology that is stored internally. This is an important point and one that often gets lost.
Historically, knowledge graphs differ from other linked data repositories in that they tend to be focused on institutional knowledge. This shouldn’t be surprising – while most organizations have many features in common, it is what is not common that most typically defines the datasets.
Knowledge graphs typically will not be accessed directly (via SPARQL) outside of the organization in question. Instead, they will likely have common APIs representing access to resources that can be styled in specific ways. This provides a modicum of security to the data (which may contain sensitive information) and simplifies many aspects of knowledge graphs that can be problematic (such as namespace management). The (arguable) downside to this approach is that for queries outside of a core set that can be programmatically broad, access to the underlying knowledge graph will be available only at a specialist level (those building APIs and those generating reports from those APIs).
This notion may be anathema to those who believe in the Linked Data approach. Still, from a utility standpoint, the more restrictive knowledge graphs are also easier to use as they don’t require a deep understanding of how information is modeled.
This also shifts the focus of interoperability away from the core ontology and towards the periphery, with ingestors mapping to the internal ontology and producers mapping that ontology to external schemas and ontologies, as appropriate.
This decision has several key consequences. The first is that the internal (or private ontology) can be used to manage maps, master data management, and operational logic without relying on some other organization’s code or priorities. There are likely to be many people who will object to this particular point, as it has become canon in semantic circles that you should always build on top of standards for maximum interoperability between systems.
However, this best practice seldom squares with reality. Knowledge graphs pull information from multiple sources, of which only a microscopically small subset comes in through RDF, and an even smaller subset comes in via consistent RDF. On one recent project I consulted on, the data sources included SKOS, Gist, FIBO, CSV, several different flavors of JSON, XML documents, and more. It was queried via JSON-LD and GraphQL when it wasn’t queried over a named query set up so that people would not have direct Sparql access except in a few very limited cases.
This is not to say that SPARQL is a bad language. Indeed, it’s quite powerful in what it can do. Therein lies its problem – direct SPARQL access is too powerful (and arguably too complex) for casual users, and most successful knowledge graph projects hide that power behind interfaces designed to protect the system and guarantee reasonable performance. At that point, the interoperability argument falls flat. By extension, the need to create hybrid ontologies becomes less important than building an operational ontology optimized for querying and updating.
To Infer or Not To Infer, That Is the Question
Inferencing is one of the big flashpoints in any discussion about semantics. Inference – using complex logical rules to constrain the production of triples that surface “hidden” knowledge – sounds great in principle but in general, assumes that the graph in question is wholly declarative. The closest analogy that I can think of here is a very complex spreadsheet where you have a great number of constraints (a spreadsheet is a graph, naturally), and by controlling a select number of input triples as parameters, you can create an inferential model based primarily on predicate logic. In this regard, an inferential graph is a model in much the same way that a machine learning system built upon gradient descent is a model.
There are many great use cases for inferential systems: fraud analysis and detection, complex system modeling, financial modeling, genetic processing, etc. Again, however, these make up a comparatively small percentage of all potential applications for semantic graph. Once you get to the level of knowledge graphs, I’d argue that inferencing can prove counterproductive.
The primary reason for this is that knowledge graphs are used either for integration (mapping one ontology to another, which inferencing provides comparatively little benefit for) or for publishing (in which case a combination of SPARQL and SHACL are likely to be more consistent with the way that people work with these systems). For instance, in OWL, you would need to create a chain of constraints that a certain property’s sub-property has a cardinality of one and only one. Querying this from SPARQL is surprisingly complicated because multiple potential paths may allow this constraint, and you are reliant upon the system to act as a check on such usage. With SPARQL in conjunction with SHACL, on the other hand, the query is straightforward, and SPARQL can introduce internal logic to perform specific actions based on whether the constraint is matched or not.
Similarly, inserting new complex objects (as opposed to simply triples) into a knowledge graph is hard to do with an OWL-based inferential system. It is much more straightforward with the emerging SHACL/SPARQL stack. This publishing model also does not necessarily assume that such constructs are all internally consistent (for instance, they could be in draft states where specific property values have not yet been fully specified).
Moving forward, this publishing scenario is likely to dominate the use of knowledge graphs. Taxonomy management, web publishing, true semantic wikis, search optimization, and data hubs/catalogs are all areas where knowledge graphs could play big roles, but to do so requires thinking about how best to deploy such systems.
Publishing and Knowledge Portals
A term that is gaining currency in the industry is a knowledge portal. A knowledge graph is just that – a graph – that holds the various data, metadata, and operational graphs necessary to describe a publication system, while the knowledge portal is the application itself – a web-based publishing system driven by an RDF quad-store that uses the knowledge graph as its data source and sink.
Publishing systems are more complex than knowledge graphs in some respects, though because publishing, in general, is perceived as being a “solved” domain, knowledge graphs are not immediately obvious as publishing solutions. Yet when you look at most modern publishing systems today, one thing that emerges is that they actually hew closely to a semantic model.
A publishing system typically has content types that display pages of content formatted to those types – articles, videos, essays, blog posts, etc. These are, at their core, classes and instances. Search is the process of querying, with the search results being links to specific content given as URLs. This is exactly what a semantic portal does.
You can even create small “localized” queries that are displayed as blocks using the same kind of querying, and these queries can get to be quite deep (and hence slow) using traditional publishing tools. Applications like Drupal and WordPress can even double as basic taxonomy management systems. This requires some fairly complex programming to make happen. This programming could be vastly simplified using knowledge graphs.
This becomes even more important as data fabrics and digital twins become more deeply embedded into companies because, in most cases, the knowledge portal is both the entry point for adding new content to the organization and the mechanism for converting the digital products of this process into either actions or physical production. Indeed, 3D printing can be seen as simply another publishing channel for knowledge portals, as can virtual reality simulations, metaverse interaction, and gaming.
As such, knowledge portals should be seen as the evolution of the knowledge graph as a vehicle for the integration of external streams, the addition of internal streams and taxonomies, the publication of Objects as documents, and the tool that manages the evolution of these objects over time.