Being an Ontologist

I am sometimes asked whether I am working on the stats, whether I am making progress on the stats, and what I do with all of the stats. People are also prone to hyperbole. I am told that I sure work on a lot of stats, I am always keeping myself busy doing stats, and I am the person to go to for stats. I suppose my real job is more mysterious than the one others imagine that I do. I first want to explain that for everyday people, the term “stats” or “statistics” often means historical data rather than statistics in a substantive sense. So people actually mean that I surround myself with data, which is certainly true. However, I would say that I don’t do much true statistics. (I do some statistics maybe in conjunction with charts and reports.) I spend most of my time on ontology, which enables me to transform resources into useful metrics. I didn’t know the meaning of ontology prior to my graduate studies. By sheer chance, I was asked to explain ontology to my class. I said that it certainly isn’t oncology relating to the study and treatment of cancerous tumours. Ontology is the study of existence or being. Worded differently, it is the study of how things come into existence or being. I was listening to an academic provide his perspective. I apologize for forgetting his name. He said that ontology can be interpreted as how things gain relevance. My perspective has always been that ontology gives rise to data. Data is a symbolic representation of something that we have chosen to recognize as relevant; there are layers of this from the perceived reality to the data that appears on my charts.

After searching Google (in the English language), I am prepared to say that there is currently no such occupation as an ontologist or forensic ontologist. This of course is quite disturbing given the amount of time I spend doing what I do. I suggest that for companies willing to make the investment, a statistician can probably be replaced by a machine – that is to say, his or her job can be automated. On the other hand, many of the duties performed by an ontologist cannot be done by a machine. These are bold statements to make especially since there is no such thing as an ontologist. So I would like to take this opportunity to discuss the sorts of processes that might exist in an organization where such an individual would probably be relevant.

Developing Data Criteria

Managers are responsible for decisions involving capital – including human capital. Regardless of the amount of research a company does, change is unlikely to deliberately occur without capital. Capital gives rise to material developments. Material developments by nature have a tangible impact on the organization including the exhaustion of capital resources. Given that the ensuing gains and losses are real, it would be understandable for managers to seek out data to support their decisions. Managers might have certain “general ideas” about the nature and type of data they would like to obtain. But since they seem unlikely to collect it themselves or be responsible for the logistics of the data collection system, the “criteria” giving rise to the data seems likely to emerge collaboratively. There has to be a shared vision and understanding; this would certainly be more probable if the ontologist is capable of dealing with business concerns.

While perhaps not always articulating it in a formal sense, a manager probably has his or her own sense of ontology that must be incorporated into the criteria: e.g. “In some cases the packages are arriving damaged because of shipping. I suspect there are many situations like this. We need to be able to distinguish production damage from what takes place during shipping. I would like to start monitoring what leaves the warehouse and what gets returned to the warranty department.” A statistician listening to this might yelp, “What do you want me to do about it? Just give me the data!” This is because a statistician isn’t actually involved. They come afterwards. What the manager wants has to eventually give rise to data allowing the company to better understand losses perceived to occur during shipping. But shipping might not actually be the cause of the problem. It is necessary to seek out not the specific truth the manager raises but the broader truth of what contributes to damage.

Developing Schemata for Data

A term that I also picked up during my graduate studies is “schema.” I routinely made use of the term to describe the hierarchical structure, classifications, and divisions of a social lens that I designed. While I have no doubt that more elaborate definitions might exist, I will use the term here to refer to how reality can be arranged (modeled – since reality isn’t literally being arranged) in order to give rise to data. For example, the shipping problem mentioned earlier by the manager might be logically partitioned in relation to departments (plant, warehouse, and shipping); divisions (central distribution, wholesale, mass merchant, specialty sales); functions (production, logistics, and customer support), roles (supervisor, driver, inspector, and sales agent); and behaviours (entered orders, unloaded skids, and packaged merchandise). The characterization of reality makes it possible to extend the narrative from specific aspects.

I suggest that a manager, while likely to have general ideas about data, would probably not have time to develop schemata. Can the job be outsourced to a specialty firm in another country? In order to outsource, the manager would have to pre-chop reality and take the time to explain the nature of the partitions – or hire somebody to pre-chop and explain. Why hire somebody to facilitate outsourcing in a process likely to lead to inferior results? Why would the manager do the job when the intent is to have somebody else do it? It is far cheaper, faster, and safer just having somebody familiar with the business – other than the manager – do the job. This is not to say that a person with control over capital resources might not also be responsible for the data. I am just trying to establish a logical partition.

Esoteric schemata might include things that I have written about in the past – e.g. elements to distinguish between internal capacity and external demand. A human resources perspective would tend to focus on the internal – a person’s abilities. An engineering focus might be on the external – equipment, devices, and structural implements. There is no capital involved in this ontological and fairly technical conversation about schemata, which limits the involvement of managers. However, creative freedom comes at a price since lack of tangible results would likely become apparent at some point. Imagine the horror of not being able to explain an excuse to somebody controlling capital resources.

Deploying Protocols for Data

I remember on an Episode of 24 when the team changed protocols to deal with a terror alert. A protocol explains how things should be done – in my case, how the data should be constructed. This is my definition anyways. Although not particularly accurate in every case, it might be useful thinking of a protocol as a written-out formula. I prefer to think of it as a prescribed framework for the data. For example, imagine a pool of 1,000 elements of data: on a spreadsheet, this might appear as 1,000 columns. I don’t know short-hand for logical equations, so I offer a bit of code: if((col[18] > col[0] && col[0] != -1) || (col[69] > col[54] && col[54] != -1) || col[100]%16) { dir.sense = true; dir.full = true; dir.cost = col[999]; } Among the implications of this code is that “cost” emerges from the pool. In practice, the problem hasn’t already been reduced to variables but rather identifiable fields and data resources. It is possible to define and reason out the parts of the protocol before setting the data-collection system in motion.

I wish I could provide a more realistic example without necessarily giving exact details of my own work. Let us say that the customer service desk at a major retailer generates all sorts of data. A brute statistical approach would be to mine that data seeking out patterns, trends, and indicators. An ontological approach would be to ask difficult questions like, “What is a customer complaint? What aspects of the data generated should be counted and in what manner to give rise to a worthwhile characterization of a customer complaint?” This is a really philosophical and intellectual process where one has to appreciate implications and consequences. If the protocol is badly constructed, aspects of reality might become invisible – “by design!” – which is terrible.

Characterizing Complex Systems

I remember in an episode of Bones, Bones expressed her willingness to be impressed if her intern had a background dealing with “complex systems.” On episodes of Fringe, Dr. Bishop and his son routinely attempted to resolve fairly complex problems mathematically by solving equations. Well, before the problem becomes a “mathematical problem,” it exists first as an ontological obstacle made particularly challenging by the complexity of circumstances likely associated with systems. Fringe is particularly disturbing by the ability of the doctor the cherry-pick variables as if reality were adequately codified to allow for such handling – as if reality exists within math rather than math existing within reality. The assertion that variables are even isolated or disconnected enough to plug into an equation negates the idea that in systems there is sometimes interconnectedness and interdependencies among variables. In any event, an organization especially if it is quite large might have to be handled as a complex system.

It is easy to understand how ontology is associated with “input” since data feeds further analysis. Perhaps less apparent is how an ontologist plays a role influencing “output” – through ontology – by modelling complex systems. For example, the manager concerned about losses resulting from shipping might not be aware of the general lack of shipping data; this historical absence of data means that the transition to driverless vehicles lacks baselines to expedite decision-making. Alternatively, the collection of shipping data in a particular manner might propel the transition forward. It is necessary to collect not just any data but data that could be used to support transition. Ontology serves to help model complex systems through schemata and protocols. This is persistent intellectual capital that might be more about the future of the company than the specific needs of the manager to reduce shipping losses.

Data Forensics

When an organization finds itself going in the wrong direction, it might be because data led the organization down that road. It is reasonable to contend that an organization should take responsibility for choosing to follow its data – as evidenced by years of continuing down that road. However, I consider it important to distinguish between the faults of an organization and its faulty data since at least there is some hope of rectification. If a product shows an extremely high level of customer satisfaction, this doesn’t mean that people will continue buying it. Even fabulous horseshoes sell poorly in a world dominated by cars. An analysis premised on high satisfaction levels leading to high sales is therefore misguided. Once an organization is in a bad place, I would reenter the data environment like a commando to identify what went wrong. I suspect that “failure of complex system characterization” is responsible for steering some organizations the wrong way. I understand that business schools sometimes offer up a “shake-and-bake” or “cookie-cutter”approaches where students use the simplest similarities between cases to arrive at decisions. A data regime premised on a certain kind of thinking might contribute to ontological impairment. A forensic ontologist tries to determine how the data came to be so defiled that it led the company astray.

Ontology as a Branch of Data Science

Invariably there seems to be this separation between “studies” and “science” giving rise to disciplines leading either to an MA or MS. There is a barrier to Eden in relation to ontology which in many respects seems like a philosophical area of study; and yet it directly supports or operates in conjunction with data science and computer science. I complicate matters by incorporating ontological theories into my design of computer programs and in my development of databases. Myself when I first read about ontology, I thought, “Oh no, it’s the study of being! I can’t believe I have to explain this in a geography class.” It felt like a dislocated shoulder. Yet I consider it is such a beautiful, rewarding, and engaging area of study particularly in combination with data systems. It is like existing in a peaceful place in the middle of a destructive hurricane. The world is chaotic and muddled outside. I start building little roads to make sense of that swirling complexity – to make it possible for the computers to do their job.

Leave a Reply Cancel reply