My mentors and compadres at Semantic Arts hold a Data-Centric Architecture Forum (DCAF) every year in Fort Collins, Colorado. It’s a chance for technically inclined data and data modeling enthusiasts to brainstorm on the gaps in the architecture and how to fill those gaps. This year’s DCAF takes place on June 6th – 8th, 2022.
It’s helpful to list the philosophies behind the data-centric approach and remember why they’re so essential. Many don’t have the time to learn how intertwined different innovations are, and why a system design approach has to bring them together to solve root problems. If companies can’t grow and harvest their data organically and harness the power of web-scale integration, that data won’t have the return on investment it otherwise could.
Core data-centric architecture philosophies
|Data centricity||Application centricity||The workforce tends to think in terms of applications rather than data, and people who work with data tend to work with it in tables, via applications. Mentalities have to change so that data is the prime mover.|
Application sprawl creates a deluge of silos and duplicated imperative code.
A key goal of data-centric architecture is to build in the capability to share and reuse data and the code that’s associated with it. In the process, data becomes more connected, meaningful and powerful.
|Explicit knowledge||Cryptic data||Data.world’s Juan Sequeda reminds his customers to make sure their data is meaningful, modeled and documented. If its context isn’t explicit, there will be fewer points of connection.|
|Semantic graphs (or multi-model equivalent) for integration and interoperation||Methods that don’t scale or improve such as data warehousing (See Luke Feeney’s comparison at right.)|
|Ontology model-driven apps||Apps that reinvent the wheel||Most of the “code” that enterprises run on could be “data” in the form of declarations and rules that live and evolve symbiotically with the data in a knowledge graph.|
|Go cross-enterprise||Stay in your own bubble||Knowledge graphs are designed for sharing services. Much of the optimization of supply chains and government departments hinges on effective collaboration between the organizations in a given chain.|
System-level innovations that need data enthusiast backing
Tony Seale, a knowledge graph engineer at UBS in the UK, made a telling point in a recent (April 2022) LinkedIn post: Enterprises need to give themselves the time and the resources to experiment with their data. But most organizations complain that they don’t have the time to experiment, he said. His main counterargument (quoting verbatim here):
But here’s the thing: unifying data and cloud into one network vastly reduces the time and cost of simple data analysis and allows you (and everyone else within your organization) to experiment and uncover what might be the completely game-changing ideas that are hidden within the connective tissues of your organization’s data.Tony Seale, UBS, 2022
Seale here notes just one of the key payoffs of data-centric architecture innovation. But most people responsible for data may not be aware of other forms of innovation that also free up more resources to do more with data.
System-level innovation examples
|Innovation||Description and benefits|
|Built-in data integration||Seale’s exactly right to use the descriptor “built-in”. Knowledge graphs are designed for integration and interoperation. Every semantic graph wants and needs to be connected with other graphs. Semantic graphs are symbiotic and can infer other graphs. Integration can be an iterative process with humans in the loop as well as machines.|
SQL RDBMSes, by contrast, seem to treat integration as an afterthought. Moreover, the integration process required is inorganic, with changes difficult to make.
|Data resource intelligence and greater network effect||Bob Metcalfe makes the point that the network effect can be beneficial in more ways than most people realize. HTTP Uniform Resource Locators (URIs–globally unique identifiers) help, for example, by making disambiguation feasible. |
The helpful data sitting there undiscovered could become discoverable when the disambiguated context exists in a less siloed, more expansively networked environment.
|Less centralized storage||Solid pods can eliminate whole swaths of data duplication. In a construction supply chain context, as Graphmetrix points out, the data storage equivalent of Github version control becomes possible. The latest version of a document or other data can be continually tracked across the network and access controlled in a decentralized fashion. Each OEM, supplier, shipper, warehouse provider in this scheme maintains access control to the data in its own Solid pods.|
|Self-sovereign identity||The vast majority of consumers and enterprises don’t realize it, but identity systems don’t need to depend on the transmission and storage of personal data to verify an individual’s necessary credentials. With a self-sovereign identity (SSI) approach, users keep correlatable identifiers (such as passport numbers, social security numbers, home addresses, etc.) encrypted on their phones. On-device matching eliminates the need to duplicate or store additional copies of the identifiers.|
I could list a dozen or more technological innovations that could help solve some of the most pressing problems enterprises face. But what’s missing is an enterprise view of how these technologies can and should be complementary at a system level. The siloed organization doesn’t grant sufficient authority, awareness or budget to those who work across the silos. And those who do are scarce.