Losing control of your company's data? You're not alone

Photo by Hermann Traub on Pixabay

Losing control of your company’s data? You’re not alone

To survive and thrive, data likes to be richly and logically connected across a single, virtually contiguous, N-dimensionally extensible space. In this sense, data is ecosystemic and highly interdependent, just as the elements of the natural world are. Rich connectivity at scale is why graph databases and decentralized webs such as the Interplanetary File System make sense.

The more well-connected data is and the more readily and holistically it can interact and evolve as a part of an interoperable system, the more useful, resilient and reusable it becomes. That’s why siloing data and stranding description logic in compiled applications chokes data off and eventually kills it, even though it may be high quality and useful to begin with.

Stratos Kontopoulos of Foodpairing AI in an April 2024 LinkedIn post made a key point about the value of interoperability at the data layer. In the post, he observed that the avocado, orange and pineapple carbon and water footprint data in Wikidata is updated. As a result, it refreshes the Foodpairing W3C standards-based knowledge graph. Those standards simplify such integrations, allowing scalable interoperability.

How companies have been losing data control

Most companies who’ve been around for fifty years or more have been losing control of data at an accelerating pace ever since the advent of the personal computer because of more and more data fragmentation, generation by generation. They lose control whenever they place more of their trust in incumbent application providers that include the largest software companies on the planet.

The more they sign up for new applications, the more control these companies lose of what should ideally be a single, interacting, unitary asset across the data layer. The more they migrate to public clouds while holding on to an application-centric, fragmentary data view of IT, the more control they lose, and the harder data integration becomes.

As Dave McComb points out, each application created using this old development paradigm generates its own data model, creating yet another integration task. By contrast, a semantic standards-based knowledge graph-driven development enforces a single shared data model across applications.

It doesn’t help that the C-suite has historically been passive and myopic when it comes to data. The 80/20 rule applies to companies and to the C-suites that run them.

The >80 percenters (doomed to mediocrity or worse) accept bad advice when it comes to automation and run with it. Years later, when revisiting a decision that resulted in a clear failure, these companies neglect to get to the root cause of the problem, which means they keep making the same mistakes. The <20 percenters (those who succeed), by contrast, learn from their mistakes.

It’s very hard to disabuse the >80 percenters of the notion that subscribing to an app can solve a problem. The point of an app, they may not have realized, is actionable data. If the app instead is walling data off and making it less accessible, it defeats the purpose of data-driven decision making.

I’m still hopeful that the <20 percenters will come around to changing their development paradigms so that they can put organic, interoperable, contiguous data first.

Data loss in the 2000s

Nicolas Carr wrote a bestselling book called IT Doesn’t Matter, an expanded version of a 2003 Harvard Business Review article that basically recommended that big companies outside the tech sector should hand off IT responsibilities to others. In the HBR article, Carr said, “You only gain an edge over rivals by having or doing something that they can’t have or do.”

Therefore, Carr reasoned, because IT had become a commodity and was not core to most businesses, those businesses should outsource IT entirely. Don’t bother understanding IT, he figured; others will take care of it.

Carr couldn’t have been more wrong. The only way enterprises can save themselves is by being agressively proactive and protective of the data layer they’re responsible for. The only way to protect their data layer scalably is by enforcing a unitary, extensible data model.

Why the loss of data control? Faulty, dated paradigms still in place

Outsourcing 2000s style a la Carr’s vision turned out to be problematic. Then the advent of the iPhone and Android spawned a new wave of mobile apps, so becoming entirely hands off didn’t happen. Companies outside the tech sector instead reinvested in internal development teams, or contracted with third party developers for mobile apps.

Mobile wasn’t any better than desktop in terms of its development paradigm. It perpetuated the same flawed application development paradigm in terms of siloed data by default and trapped logic inside the compiled application that instead needed to live with and be updated with the data. The result was even more fragmentation, because so many new mobile apps appeared.

Every time companies spend more money contributing to the same old status quo application-first, data-takes-a-back-seat development paradigm, they will lose more control. That’s uncontrolled silo sprawl and resulting loss of data control by definition.

This data control loss phenomenon isn’t limited to traditional applications, and it’s not just due to the growing decentralization of computing. Delegation to third parties has become much easier, for instance. Today’s workflows are replete with leased application functionality from third parties, which increases the magnitude of risk that companies expose themselves to, among other concerns. Every time a company adds a partner actively managing a piece of its workflow, the company increases the risk footprint it has to manage.

AI trends are also having their own considerable accelerating impact, from a risk perspective and otherwise. For example, the quirky, horrendously inefficient data lifecycle management habits and inorganic, lossy data development paradigms associated with most machine learning also undermines organizational control of data.

A major problem of organizations adopting generative AI right now is that they’re unwittingly and uncontrollably sharing data with GAI third parties, data they might think twice about sharing if they understood more about where it actually ends up.

How more data loss happens in traditional and AI apps

Organizations used an average of 130 software-as-a-service (SaaS) apps in 2022, up from 80 in 2020, according to Better Cloud’s 2023 State of SaaSOps Report. Assuming a 15 percent annual average growth rate for the SaaS market in 2023 and 2024 (Statista’s estimate), the average organization may now be using close to 150 SaaS apps, not considering shadow applications billed to individual employee credit cards the organization hasn’t been able to track.

Large enterprises with over $100 billion in annual revenue who haven’t implemented strict procurement controls may each be subscribing to thousands of different SaaS apps. Most of those apps will likely be underutilized, with only a subset of potential users making effective use of them.

Traditionally, each application whether SaaS or not is tightly coupled to its own database with its own data model, or at least the data model of the provider. If you’re using SAP, you may be using Integration Suite. If you’re using Adobe, perhaps it’s Creative Cloud.

Large enterprises may be using dozens of different third-party integration platforms. When they hire a new staffer responsible for integration, that staffer may specify their own integration platform preference. Most staffers aren’t thinking of a larger, system-wide integration challenge. They’re just trying to solve the problem that immediately concerns them. The result is a proliferation of integration tools in addition to other SaaS apps. Procurement departments may not be aware this proliferation is happening, or why.

To harness the full potential of a traditional app’s data requires specific, system-level integration effort. Even if you’re using a capable integration platform, you’ve got to deal with the fact that the data the app creates tends to be siloed by default. To integrate requires a separate effort, and when you’re done, it may be only application layer integration, which is far less flexible and capable than the best data layer integration methods.

The implication here is that for established organizations, in 2024, what dataware company Cinchy calls an “integration tax” could be growing at 15 percent or more every year when it comes to typical SaaS apps. The integration tax for AI apps must be substantially higher, considering the manual effort evident to bring all of these into the fold in a data or lake warehouse, for example.

The challenge of managing copied data in warehouses is another related issue. The challenge with copied data has spawned a zero-copy integration standard. (See https://www.techtarget.com/searchenterpriseai/tip/The-role-of-trusted-data-in-building-reliable-effective-AI for more information.)

Interoperable ecosystems, not just data sprawl that fills storage and processing clouds

The dominant development paradigms in use today silo data and strand code by default. What’s the alternative? A development paradigm that promotes a unitary, organic, interacting data ecosystem such as the one Dave McComb discusses in his books.

If the <20 percenters push for such a paradigm by starting with semantically (meaningfully, logically) connected data in knowledge graphs, the >80 percenters and their vendors will follow–because they’re followers anyway. But the overarching theme here is that companies have to be hands on, focusing on making the data layer manageable and reusable. Data, after all, is what determines the shape of AI models, and core internal data is what companies run on.