Data-centric development: A hypothetical tech manufacturing example

My son and I both had stints in 2022 at low-volume, high-margin, long-established manufacturers. My son was doing assembly for a power management system maker with renewable energy/smart grid customers. I was doing compliance document management and related analytics for a bioengineering equipment maker.

Both companies had some of the same nagging challenges. My son characterized the biggest problem he saw as inventory tracking and control. Assemblers lacked confidence that the subsystems they needed would be available in the right state of assembly when and where they needed them.

In the case of the company I had the opportunity to experience firsthand, workers couldn’t predict or assure the availability of various kinds of resources, whether it was parts, other supplies, specialized labor or something else less tangible, such as approvals.

The information systems at both companies are typical of what I’ve encountered during my 35 years here in the Valley. The companies are both well-managed and profitable. But the data architecture ultimately fails. The failure is because of the propensity to balkanize or strand data that should be consistently woven into a unified, discoverable whole, so that the right information can get to where it needs to be at the right time, for the right purpose(s).

How a great business culture can fall short

The bioengineering equipment maker workforce I got to know has a powerfully effective, enlightened culture, committed leadership and delivers sophisticated products and services to some very discriminating and knowledgeable buyers. Even so, they are compelled to use disconnected bits of dozens of applications and multiple SaaS suites to get work done. It’s like that’s the only way they know to do what they’re doing.

Duplication of effort is the only way to get a system qualified, shipped, and installed at the customer site. Each unit of operations understands the root problems but doesn’t have the authority, budget, or understanding of data architecture innovation to solve these problems. It’s fortunate the margins are high and demand strong. Otherwise, even the highest-performing business units might go into the red.

The result is time wasted and a workforce underutilized. The workforce spends considerable time either working around information gaps, or searching for information that’s been buried or isn’t in the correct form to be able to finish tasks. Time that could be spent on higher-order activities instead has to be spent with extra data entry, emails, meetings, or phone calls to address small issues that invariably gum up the process.

How data-centric development opens new opportunities

All organizations that sell products and services have their own version of the manufacturing company data challenge I’ve just described. They may not be making or shipping products, but the information dilemma generally is endemic to organization-wide systems.

Data-centric development (DCD) takes a different tack on how to address the issue. In fact, DCD is less centralized and less tightly coupled than conventional enterprise development approaches.

Think agents or digital twins on the client side designed to be interactive, instead of just APIs, for instance. And think knowledge graph(s) (KGs) on the server side that make up a semantic integration layer designed for all types of data and content. The KG methodology has its heritage in the content world, though it’s just as applicable for transactional and analytics data.

With this list of how DCD can address process and supply chain disconnects, the assumption is that a DCD best practice is standard semantics within a knowledge graph building and implementation context. Description and predicate logic (think set theory or the coded equivalent of Venn diagrams and Boolean logic for starters) empower that graph.

Relationship-rich, smarter data is the norm. Data centricity within a KG context implies that organizations are committed to doing more with their data. Smarter data helps because the relationships (the verbs) that connect entities (subjects/objects) are explicit, well articulated and dynamic. Contrast with table linking and foreign keys, and relationship logic stored elsewhere than with the data.
In the case of KGs, relationship (predicate and description) logic is in the graph. This design initially confuses developers who feel that the “logic should be in the app,” period.
Both broad pattern and entity/event context modeling can be used to model the business. A primary objective with DCD is contextual computing: data explicitness, clarity and reuse potential derives from developers helping to build each relevant context, such as department, business line, partner or supplier context.
The richness and connectability of the data enable context modeling across hundreds or thousands of sources. Current ontology (graph data model that describes, for example, how a business runs in an abstract, reusable way) modeling techniques are both pattern and event oriented. An ontology, business-specific taxonomies and rules constitute the shared intelligence that resides in the knowledge graph.
Developers call the descriptions and rules in the graph. Rather than recoding the logic as a part of the application, developers use shared logic from the knowledge graph. The logic is therefore just as accessible as the instance data.

How this helps organizations

What’s most painful from a staff quality and efficiency perspective is that much of the data entry has to be done several times, in different places, for different contexts. The duplication is not only wasteful, but results in data that’s up to date in some places, but not yet in others, or clean in one way, but not analytics ready in another.

The more places humans and machines together generate the same or similar data, the greater the chance there is for misunderstanding or misinterpretation. Each of these interpretive disconnects constitutes its own bottleneck.

Data-centric development with its “create once, use everywhere” approach has the potential to ensure that bottlenecks are avoided or only need to be fixed once. A fix can become persistent and shared. Moreover, the intelligence created with the help of the information fabric can refine the information shared context by context, as well as identify discrepancies so they can be rectified before they create a larger problem.