Subscribe to DSC Newsletter

Addition of Different Dimensions to Data

I was often the lone wolf among my peers in university because I supported a prominent place in society for corporations and an important social role for capital. I questioned whether the directors and executives of companies entered into boardrooms really intending to “oppress” people such as minorities and people with disabilities. Did they deliberately make bathrooms inaccessible to people in wheelchairs perhaps to advance their preconceptions of who gets to go to the bathroom, I pondered aloud. Well, there are certainly individuals in companies with attitudes that might be blamed for certain unfortunate events. There might also be a corporate culture of insensitivity. However, I emphasize the importance of data systems in decision-making processes. When decisions seem to be disassociated from reality, I consider it worthwhile to examine the data system, environment, and organizations setting. I find that disassociated decision-making is a routine occurrence. A company will produce something that people will stop buying or not buy. I propose here that “control” even among companies is something evasive due to the absence of sophistication in their data systems. In this blog, the system that I will be addressing is the data itself, which might not appear to be much of a system on the surface. This is the whole point. The data can serve as much more of system to enable greater levels of control. It must have the ability to capture reality in a greater number of dimensions relevant to organizations.

The insulation of reasoning from the outside world is comparable to alcoholism or drug impairment; this provides a synthetic or artificial form of control. It is a worthwhile struggle to overcome such adverse conditions. However, I would suggest that it is actually technically difficult or challenging to arrive at options and alternatives; to give hope of efficacy in terms of different courses of action. So I would like to share some of my own efforts and offer different ideas to others in the community. I don’t approach this as an authority but just somebody who has walked down nearby paths in somewhat different directions. About a year ago, I wrote a short program to add up events and provide me with daily running balances. The development of this program has been rapid and unexpected, certainly beyond the scope of my original plans. Rather than write about the program itself, I intend to cover the some of the underlying concepts behind it. Still, I will present some files from the prototype to substantiate and reinforce the ideas. My main focus in this blog is data-embodiment. I have used this term in the past albeit without design information. I consider the conceptual structures behind the coding broadly applicable for those developing their own systems; as such, an examination not so much of coding but design might help others on their journey. My prototype is called “Tendril.” At this time it is written only in the Java programming language. I will only mention it a few times in this blog.

An “event” is something relevant that is worth recording. For instance, a charity might consider donations important: along with the amount, it would be useful to know when donations were made, by whom, from where, and perhaps the fund-raising circumstances. Usually when we consider data, we do not distinguish between what we can control from what we want to control; instead, we retain considerable amounts of information on details that we cannot control. However, even in the case of a charity where control over operational outcomes might not be the main consideration, it can still be useful to distinguish between the level of giving (the context) and the setting in which those contributions occur (the events). It is simply a bit confusing delineating between events and context. I’m going to set aside the question of separation since this is really a matter of strategy and judgement; moreover, just to immediately confuse the issue, a context can be made into an event; of course an event can be made into a context; and all sorts of non-control factors can be distributed as events.

In the real-life example from Tendril to be covered shortly, the “events” include my choice of driving routes, weather, and road conditions. The “context” is how quickly I reach my destination. By and large, there is good conceptual separation between the events that can be controlled (the choices made during the drive) and the context in which control occurs (the need to minimize the duration of the drive). The least logical choice for a context would be weather since there is no hope of me influencing it (that is to say, avoiding it or making choices to alter it). However, a person can make driving choices in light of weather in order to shorten a drive. The diagram to follow shows the connection between events and context. Another term that I will be using for events is the “body of data” or just “body”; “head of data” or “head” for the context. I will be lopping off terms such as decapitated, headless, and disembodied data. In other blogs, I already made use of the term “data-embodiment,” by which I mean the formation of complex structures that include events and their different contexts. Data-embodiment is about the development of informational constructs.

I can’t take credit for structure shown above, only perhaps for its specific portrayal. The structure already exists in the data science community: “Are you satisfied with this product? Please help us understand why by checking the points that apply.” Customer satisfaction can be regarded as a type of context: organizations hope to make customers satisfied through particular events. The “base-structure” for embodiment can therefore be found in customer satisfaction surveys. It can also be found in checklists: for the achievement of an acceptable standard (the context), it might be necessary for specific conditions (the events) to exist. My conceptualization of embodiment is meant to accommodate massive amounts of data; however, much less data can be handled and interpreted in a similar fashion. It is possible to have just a single event and context, making conflation likely for those easily tempted by simplicity. It is easy to confuse the event of social membership with context of contributing to society; driving a car with getting to work; having friends to partying. I actually recommend that people practice bifurcation to assist in the development of critical thinking; it might also contribute to effective base-structuring.

In the next illustration, I set a number of contexts side-by-side. The purpose of doing so is to establish progression. I find that the idea of “progression” (not to be confused with progress) often requires some thought. A simple example of algorithmic progression is duration: in such a scenario, different time ranges can serve as contexts. Some might say, defining time ranges is hardly algorithmic, which is quite true. The definition of time ranges is not progression. Having a contextual array in order to distribute events is progression. Compiling these events through contextual gradients is algorithmic. In a complex organization with competing needs and objectives, progression is probably less straightforward. However, notice how the level of discourse has become substantial: rather than simply discussing something flat and mono-dimensional such as sales, the conversation is about contributing events and perceptions of progression that might go well beyond the physical sale of merchandise. The idea is that we cannot control sales per se but rather the events; we cannot perceive the satisfaction of human desire except through elaborate instruments such as progression. I know that some might suggest, a perceptible increase indicates an appeasement of human desire. Sales don’t measure appeasement per se. A person can buy something purely on expectation and preconception.  Disembodied metrics provide little explanation of underlying phenomena.

There can be an unlimited number of “events.”  How exactly these events should be distributed given different circumstances is a topic that I will leave for some other blog.  The programming term that I use to describe a context is a “counter.”  A group of counters associated in a contextual gradient or flow pattern is a “monitor.”  In practice, I tend to have many monitors under a particular “program.”  In the next image, I try to convey the idea that our perceptions of phenomena can be supported by multiple monitors giving rise to contextual multiplicity.  In another blog, I identified the different conceptual flows of information in an organization:  projection, direction, and articulation.  I said that big data might find a good market in internal articulation due to the need to make organizational sense of the environment.  I added that in articulation, the “metrics of phenomena” are most relevant.  I know this probably sounds purely philosophical, but actually I was giving design guidance.  There is no limit to contextual multiplicity.  This is the nature of the metrics of phenomena.  Sensitization and pathological desensitization are worthwhile considerations affecting the metrics, I guess also to be discussed on some future blog.  I just want to point out that the presence of a conceptual construct for big data does not necessarily mean that it will work well:  good design is still important.

I would now like to introduce my real-life example. I think of all the different blog topics a person might write about, computer code is perhaps the driest. Fortunately in terms of the prototype, the supporting concepts are fairly coherent; or at least I hope others find it so. I will give my example in relation to a special box. In trying to name this box, I could only think of Pandora’s Box. Apart from the name already being in use in folklore, I thought it might also be the name of a retail outlet or product that holds jewelry. So I decided to call my box "Pandora’s Unified Epistemology Box." Being a rather lengthy name for a box, I will simply refer to it as "the box" with my apologies for having such poor marketing skills.

I chose to present my commuting database since it is fairly new; consequently, its design remains rudimentary. It contains information about my choice of roads during the commute to work and home. To the left of the illustration to follow, I present the events or body of data for a single day. I distributed these events on [tuesday] [may] [day6]. The commute was about [ride75] minutes in duration (rounded to the nearest 5 minutes). It was rather [sunny] and [dry] outside. I left work at [1641] and got home at [1755]. There was quite a lot of roadwork taking place at the time: [desurfacelots]. There can be an unlimited number of events. To the right, I show three different counters or heads of data. Each counter is meant to be used in the context of a 75-minute drive home. I find that my driving times vary a lot. At the moment, factors such as Wednesdays, non-dry road conditions, and dim lighting measurably lengthen my drive. I maintain contextual multiplicities for different road conditions and routes. A common change in route occurs when I stop to get gasoline. There is also a route that I call “evade” meant for situations involving blocked roads. The question of what constitutes a “route” is actually a strategic design decision that is not necessarily straightforward.

The decision-making and situational construct can be characterized as a 3-dimensional box as shown above. Driving time is apparent only on the surface of the box. Now, it goes without saying that another person might record his or her daily driving time and arrive at a similar sort of arrow without using anything resembling my box. Similarly, a person can record sales data without any understanding of the contributing factors. The box is designed to support the embodiment of data. This embodied form is persistent: it can exist indefinitely in the data system to explain the surface metrics. Although it is hardly unusual to have surface metrics, imagine the benefits of maintaining data in its embodied form particularly in an historical context or in relation to the intellectual capital of organizations. Once the metrics are disembodied (or perhaps they were never embodied in the first place), the facts are free-floating in space wandering intangibly like ghosts. I have made it my calling to track these ghosts.

I do not consider myself a professional programmer. I actually have a certain level of mistrust towards computer code. So I frequently find myself incorporating diagnostic checks into my work. In the case of my commuting database, apart from having commuting time as the context, I also distribute events of commuting times, which I know must seem rather redundant. Call me paranoid, but the events of commuting times should follow the contextual gradient rather closely; this is confirmed below with a correlation of -0.996. (It’s negative because I prefer having higher scores for short commutes.) The contextual gradient is a work in progress: that is to say, I am still considering different ways of measuring the extent to which events contribute to contexts. Currently, I use a measurement called the crosswave differential. Using this approach, I can say that the ride home on Monday has been faster than Tuesday; dry road conditions are extremely important; and sunny days actually seem to slow people down. I am not making general statements about human behaviour but rather the dynamics as I encounter them given my driving style and the routes that I take going eastbound on the 401, one of the largest highways in North America.

Above is my “testing 1, 2, 3; testing-testing” pattern. I am confident in my work to the extent I can determine when it isn’t working. The reason I find myself with a ranking of events is because the data is embodied by events. Without the body of data, there would be no data to provide a ranking. What we describe as data can be handled as a complex structure for the purpose of determining the contribution of events. I indicated in an earlier illustration that it is possible to have many different groups of contexts (contextual multiplicities or “monitors”) to help describe phenomena. These provide the metrics of phenomena. The monitors help us perceive the reality of the phenomena. However, there is no limit to the number of monitors; and it is often worthwhile to have at least several. While certain events might contribute positively on some monitors, the same events might register negatively on others. In other words, there might be competing risk-reward dynamics. I consider event-distribution and -recognition to be fairly detailed topics; so I will not cover it here at this point. Perhaps I will elaborate on some other blog.

I said that events tend to reflect the things that we can control while contexts tend to be things we would like to control. This is a general rule of thumb that is subject to creative discretion. Having placed commuting times into the event distribution for diagnostic purposes, it seems reasonable in practice to incorporate various metrics in the distribution to determine the contextual association. A person could for instance distribute stock prices, incidents of vandalism, diesel consumption, and other metrics as events, leaving the choice of contextual realities to influence the portrayal of how the events fit. However, something such as vandalism is hardly a simple event. I describe it as a symbolic aggregate; that is to say, there are probably many events already embedded within it interacting with each other. The results are particularly intriguing if the events are expressed as a coherent gradient:  it them becomes possible to speculate on apparent causality between the pattern of events and the contextual gradient. I have an interesting simulation where I use Tendril to guess how much of certain key gourmet coffee ingredients customers prefer given a stream of simple scores: e.g. 1 terrible up to 20 fantastic. I personally find the program effective. It is possible to determine which specific product characteristics allow for the highest prices. Only a fairly sophisticated “detection” system (event-distribution strategy) can record the elements that go into an order. It would be necessary to “check all that apply,” bringing us back to the base-structure for data embodiment.

Organizations tempted by simplicity will likely be disadvantaged in the emerging complex data paradigm. In this discussion of adding dimensions to data, I hope it is apparent that the data itself is becoming more complicated in a structural sense. Therefore “the data” requires some thought in terms of how to address real-life problems. The structural challenges concerning the data are real. I feel it represents a field of study all on this own. I haven’t really invoked statistics except on the diagram for the Event-Context Efficacy Test. I consider the event interactions statistically evasive although I don’t dismiss the idea of trying to make use of statistics. Statistics just seem a bit out of its element. “I stopped for gasoline on Wednesday afternoon. I decided to take the collector lanes on my way home from work. It was raining. There were tractor-trailers all over the place. Drivers were weaving in and out. Some of them looked like they going to snap. I kind of wondered when I would get home.” I consider this is a pretty hostile setting for statistics. Still, given advances technology, it is hardly my place to dispute how statistics might be used to sift through event data. In terms of how to make estimates in an embodied data environment, I am still working on this; but I admit that I do it all the time using the event gradients as proximity references or “landmarks.” “I think I can pick you up at around 6 PM today.” I didn’t snatch that estimate from outer space.

So I don’t rely on statistics for handling embodied data. I don’t consider the crosswave differential particularly statistical in nature; it is a tool more to differentiate between events and ambient conditions. I guess the point is debatable. There must certainly be a means to confirm whether or not the structural design seems to be delivering noticeable improvement in real life and for diagnostic purposes; I think statistics has an important role in this respect. How I design the system of distributing events and assign contexts might be completely different from how another person decides to approach the same problem. Despite the added dimensions to the data, there is frequently a need “confirm performance”; I have described this imposition on data as the “metrics of criteria.” There is a balance between criteria and phenomena because in the human world, many things that exist for humans do so for a purpose. It is not enough just to have embodied data. There must be systemic embodiment. The data is part of a human system such as production. Nonetheless, I would say that for the most part, the balance has not existed. The role of data has been largely instrumental and criteria-driven. So I hope that others might consider embodiment in the articulation of operational and environmental conditions.

Views: 628

Tags: alignment, analysis, applied, articulation, assisted, big, computer, congruence, construction, data, More…design, determinism, ecosystemic, embodiment, environmental, management, methodologies, model, organizational, possibilism, social, statistics, strategic, structural, systems, theory

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service