When I returned to university to do a graduate degree, I was interested to discover how certain terms are subject to "intellectual interpretation." A word that I was asked to explain during one of my earliest classes was "ontology." Since this term was absent from my dictionary, I originally confused it with "oncology." I faintly recall that oncology involves the study of tumors. After consulting a few sources, I said that ontology is the study of how things come to exist or into being. I came across another perspective although I don't recall the source: ontology is the study of how things gain relevance or become recognized, indicating that existence can be regarded as a matter of recognition. Perhaps there is no monopoly on the exact meaning. However, I would say of ontology in relation to data science, it explains how meaning is attached to data and therefore how that data gains and retains meaning. For example, if I were asked to count the number of trucks in a parking lot, it isn't obvious what should be included: small pick-ups, tow-trucks, commercial hauling vehicles, dump trucks, and maybe heavy construction trucks. Consequently, if I have databases containing running counts of trucks found in various parking lots over a period of time, the comparability of the data is worth questioning. It might be necessary to ask some fairly philosophical questions: under what circumstances, how, and why is something a truck? We find ourselves with a fairly esoteric topic colliding with the concrete demands of a data-oriented world.
In feminist literature, which I rather enjoy reading when I have the opportunity, one might encounter the question of what makes a person a man or woman? I know that at first glance and without deliberation, it might be tempting to pose genitalia as the distinguishing feature. We rarely go about our affairs in the world with our genitalia exposed. Some women are able to pass as men. Some men might be mistaken to be women. Certain men will be regarded as more masculine than others. I remember a comedy where one of the main characters said he had been raped in jail by a man. He was asked if the rapist was a black man - the explanation being that a scrawny white man probably seemed fairly feminine compared to a hulking black man. No. He was then asked if the rapist was Hispanic. No. The questioner went down a list of different types of men. In the end, the victim admitted that he had been raped by a Filipino. Perhaps Filipino men don't seem like imposing male specimens. I was born in the Philippines. I found the film rather funny. Well, my point here is that masculinity and femininity are more complicated than genitalia. On matters of ontology, I would say that a great many things are more complicated than they seem on the surface. When can a sapling be regarded as a tree? What makes a person Italian rather than French? What does it mean to be a Canadian citizen? In our time, we find ourselves trying to determine what makes a person a terrorist - the behaviours that seem suspicious. For some, ethnicity and religious inclinations are reasonable determinants. Then we try to give estimates and projections as if we understand everything.
Ontology in My Research
I personally cannot help but laugh - albeit with the greatest level of respect - at the idea of using a computer to interpret language in order to determine meaning. I laugh because I have difficulty myself trying to ascertain meaning - or, more specifically, confining meaning. Being in a room full of people speaking the same language doesn't mean that communication will go smoothly. Meaning and shared understanding might be dependent on ontology. I will give an example. I was studying customer service issues relating to people with disabilities brought forward to various human rights tribunals. I developed a model called ACTOR that I found aligned with tribunal decisions. ACTOR is an acronym for Attraction, Conduct, Tenacity, Organization, and Recognition. I noticed that when a tribunal adjudicator made ACTOR-related comments - either on the specific case or the character of a service provider - I could almost guess which side had won. I also found a connection between ACTOR comments and the damages awarded. None of this research has been published by the way - quashed prematurely by my departure from academic life.
In order to carry out my analysis, I went through the case documentation to locate ACTOR comments. Then I had to ask myself questions like the following: is that point about Conduct or Tenacity? What is Conduct and Tenancy, really? What makes Tenacity different from Organization? In the context of a case, it is difficult to make clear delineations bringing about the existence of ACTOR elements. Below is an image of a completed ACTOR survey sheet - just one of several dozen that I used to do my research paper. It is a joy surveying case documents rather than people. There is no need to submit the questionnaire for approval. Moreover, it isn't necessary to compensate participants. On the downside, it was necessary for me to assume the persona of the adjudicator - noted on the document as the "subject." I had to constantly ask myself, "What did I (the adjudicator) mean when I made that comment?" Since I was writing an academic paper, I couldn't just go by my opinion. I wrote down page numbers and specifics for each ACTOR-oriented comment that I encountered. Upon close examination of the sheet, it should be apparent that ACTOR is focused on the adjudicator's perspective of the service provider's mental processes as supported by the case documentation. It was a considerable challenge associating the "evidence" (the comments) with the appropriate ACTOR definitions. I was responsible for determining when the sentences created or became evidence in the context of ACTOR.
External and Internal Ontology of Data
When dealing with a database, routinely the ontological basis of the data is external. By this I mean that the designer of the database has set aside specific columns for data. Meaning is not intrinsic to the data but defined by the structure of the database. For example, one column might contain "quantity sold" and another "price per unit." One would not expect to find temperature under quantity sold. I know some might consider my point ludicrous and extreme. Well, weight and volume and indications of quantity. It might be possible to sell something based on BTUs. My point more generally is that the database defines the meaning of the data placed within it. If there are several databases, due to the externally defined nature of the data, one cannot immediately assume comparability. Somebody with an extremely good understanding of the data might apply adjustments to allow for greater levels of comparison. This means that the data would change - further emphasizing the fact that the ontological control is external.
If we refer back to my ACTOR example, it should be apparent that subjectivity is an important consideration. The assertions are qualitative although I have assigned quantitative characteristics to the data. The meaning of the data doesn't have to abide by my assessment. The meaning exists irrespective of me or any database that I create. The ontological basis is intrinsic. I can create a database to try to fit the data, but this does not mean that the database defines the data; it merely provides structure for my particular application. For example, if there is a number under "quantity sold," this is the literal meaning. But if there is a number under "Tenacity," that number doesn't mean tenacity. In fact, it is quite difficult to provide a clear definition for tenacity. I attach meaning to something not entirely understood. The meaning exists beyond my contextual application. Suffice it to say that my ability to attach meaning through measurement does not mean that I understand everything about the object being measured - just the aspect of it that conforms to my contextual requirements.
Fallacy of Nominalism
Similar to ontology, I have encountered various interpretations for the fallacy of nominalism. I don't recall the exact reasons leading up to the conversation, but an undergraduate professor explained to me, "This is the idea that a person can, just by having a name for something, somehow have a total understanding." I have read other perspectives on the term. From an ontological standpoint, this might mean that things could gain relevance merely through identification and labeling; moreover, labeling or naming forms the totality of that relevance. These days in relation to data science, I believe that there is a similar line of reasoning. Rather than naming, we sometimes imply the totality of things through authoritative quantitative and statistical representation. Even if an object exists in a multiplicity of states and contexts, epistemological authority might be commandeered through the assertion of one context in particular. In this way, the data extracted from phenomena might become disassociated from reality, reflecting more its contextual application than the underlying truths of its existence. I have referred to this contextual domination as "instrumentalism" (although this term might already exist in other applications).
At one time, the separation between humans and machines was clear. But as factory environments emerged, people started to wear out and be replaced like machines. They started moving repetitively and incessantly like machines. The lines between the factory and a person became blurred. Moreover, the reach of the work environment has started to extend well beyond the workplace - influencing the way people structure their lives and their interactions with society. We might say, therefore, that the contextual dominance of the workplace has over a period of time dissolved the multiplicities once more accessible to people. The metrics of the workplace have served to define the value of people and their placement in society. I therefore find it difficult to separate ontology - these days involving the recognition of phenomena in data - from the power dynamics that have consumed society for many centuries. While we might not be dealing with names and labels per se, in many respects we continue the drive towards greater levels of nominalism through our quantitative expressions and assertions of phenomena.
Centrality of Ontology
I have so far identified three ways in which the relevance or existence of things can be externally defined and conveyed through data: 1) through labeling and naming; 2) through authoritative quantitative expression; and 3) by design using the structures of a database. I will now discuss definition in the conceptualization of process, which I admit is rather distant from generally recognized ideas relating to ontology. In our world today, technology and processes making use of it have become extremely complex. It seems that we tend to address this complexity by "black-boxing" processes. The expression "out of sight, out of mind" comes to mind. Once inside that black box, significant effort and highly specialized skills might be needed to rectify even minor problems. While the ontology of the past could sometimes be attributed on idealism, in the future I suggest that it will more frequently take shape as a consequence of technological developments. Many technologies are meant to persist over a period of time. Although the world is evolving and changing, the tools that we have to enable human existence within it are prescribed and predefined at design time. While the obsolescence of something tangible is apparent by its physical presence - e.g. a bag of horseshoes in a world of automobiles - the emerging constructs of technology are far less so.
Consider what it is like to require a wheelchair for mobility in a world that by design is fairly inaccessible. The roads and buildings are designed to last for decades. Architects, urban planners, and perhaps many employers might fail to take into account those with particular needs: elderly people, pregnant women, those with injuries, paramedics, firefighters, and as I mentioned earlier individuals making use of mobility assistance. The same way that the material world can be made inhospitable through the actions and decisions of designers, the same is true in terms of the disabling nature of technology and processes. There could be a bus stop at a corner . . . although buses no longer stop there . . . due to decisions to cut back on service. But not everyone has a car, the option to car-pool, or alternate bus routes. So if the transit service fails to take my needs into account - fails to consider my existence or the value of my existence in the scheme of things - my involvement in society might greatly diminish. Perhaps I am interested in settling down and building a home near that bus stop. Maybe if the bus actually stopped there, some people would get off to eat at a local restaurant. I don't pose myself as a public transit activist (although I believe in promotion accessibility and mobility). I'm just saying that sometimes, the attachment of value and relevance occurs not near or by those affected but perhaps extremely far away by individuals insulated from the consequences.
I will now describe a rather deep shift in the manifestation of ontology in society that I call centrality. In a modern production facility, automation is everywhere. Robots and machines operate within their design parameters. They sometimes provide signals: for instance, when a printer is out of paper, it might beep or blink to alert the user. Despite the possible use of sensors and data-loggers, machines often provide little or no feedback. A company can successfully produce products that are defective and that nobody wants to buy since the machines are not going to care or complain. In fact, even if the factory is flooded or on fire, its machines that are still able to operate would probably continue to do so regardless. Machines are extensions of designers - the individuals exercising power to set priorities and define what counts. In a highly automated production setting, the recognition of reality is centralized. Recognition and the attachment of relevance occur at design - just like in my bus stop example. In comparison, in a factory that contains a lot of people, these individuals exercise some autonomy and personal discretion not just in their behaviours but also in terms of their recognition of relevance and in their decision-making.
Many years ago, I instructed my bank to take monthly amounts from my bank account to purchase mutual funds. I enjoyed seeing my investments grow as a result of my purchases. After a number of months, I noticed that my bank account balance was not declining although my investment account showed regular investments. Since I was dealing with a major bank, I considered it unlikely that the discrepancy would last for long. As the months continued and I became richer perhaps due to glitch in the processing environment, I decided to contact the bank to set things straight. It seems that the discrepancy was "invisible" - that is to say, the internal systems of the bank lacked the ability to detect the discrepancy. I thought to myself, those responsible for setting up the system perhaps failed to take certain types of problems into account. Of course, these days, it is often difficult to reach humans to complain about service. Companies might be losing money every day without even being aware of the loss. This desensitization to reality is connected to the centrality of ontology. If humans were more - prevalent, abundant, present, I'm unsure which term to use - let's just stick with more humans; if there were more humans - allowed to exhibit natural human tendencies - at least there might be some chance of problem identification outside the design parameters. People can take into account things the designer did not.
I recently tried to sign up for a first-aid course. I went through all of the registration pages online; at the end of the process, I was advised that there was an error processing my request. I was directed to phone a particular number. I phoned that number. I listened to quite a lot of options. Finally, I was told to leave a "general message" in the organization's inbox. To me, this seemed to imply that "specific messages" were not for the inbox; thus, I actually had no inbox to leave my specific type of message. I was left wondering what the IVR system designer meant by "general message." I concluded that my message isn't of a general nature. However, the system designer only anticipated or built the system to deal with general messages. Consequently, directing me to call a phone number without a human arbiter was entirely illogical. Taking into account the possibility that there might not actually be a process to deal with registration errors, I decided to send an email. I noticed that a competing organization offering first-aid courses offers "live chat" to provide assistance - for those that require more than what the system is designed to give. So the latter company might be losing all sorts of money without having the foggiest idea. Their "institutional response" is premised on the effectiveness of ontological centrality. The former company assumes that the process might fail; and humans have been assigned to compensate for any technological-procedural deficiencies.
My pick-up truck is part of that huge Takata airbag recall announced recently. I'm unsure about the exact nature of recall, but I remember the world "shrapnel" mentioned on a few occasions. I made the observation in an earlier blog that quality control is not necessarily about controlling quality but ensuring conformance to design specifications. As such, it is possible for a company to produce and install defective, poorly constructed, or shoddy devices. Indeed, some devices can be tested and approved at laboratories by people likewise ensuring conformance to prescribed testing parameters. The detection of something "bad" (recognition of something bad) might occur at a much earlier point in production than the checker. For the checker does not define what is good or bad; he or she merely ascertains conformance or lack of it. If the need to save money is quite strong, there might not even be a human checker. After all, the work would likely be mundane. There might simply be machines ensuring the proper installation of specified components. Not only can problems be rendered "invisible" to the organization, but opportunities for improvement might likewise be centralized - i.e. limited to the designers. Matters of ontology are really relevant in business. It wouldn't be enough to say that the data seems to indicate that the parts conform to quality standards; or that the process conforms to design specifications. To be sensitive to reality post-design, people should be involved (as opposed to merely present) in operations.
In the digital age, the difference between centralization and decentralization is actually illogical; distance is irrelevant using computers. People in Cuba and Guam can communicate and work with people in Greenland likely over the internet. Decentralization is relevant to the extent that the operational conditions can be "recognized" by those presumably closer to the activity. Decentralization for me is actually a call for more human involvement. People can offer the "lived experiences" and reflections that would likely evade computers. Humans are not just thinkers but feelers endowed with all sorts of motivation to change the world. As part of that involvement, there has to be reduced centrality in terms of ontology. Decentralization probably wouldn't work well or make much sense without reduced centrality. Yet the direction of technology development, I suggest as indicated by the increasing absence of humans in production, is towards greater centrality. The commandeering of ontology will be reflected in the data that is collected - its narrow contextual focus - its non-adaptability to changing needs - its market alienation - giving rise to problems that seem almost negligent and most certainly bad for business. Data scientists should ensure that phenomena can be portrayed in data beyond their immediate analytical requirements. It might be necessary to extract or mine for data of apparent value. However, it is also important to respect the complex multiplicities inherent in phenomena. There should be more willingness to consider the internal ontology of data.