Being the son of a mechanic, I have spent many years handling power tools. I'm especially fond of a couple of hammer-drills in my possession. They can effortlessly drill holes through concrete. At least, this is what my father once claimed. He handed down his most treasured tools to me. I'm big on pliers and screwdrivers. This might be due to my vocational training as a technician. Even today - long after I completed my diploma and continued to further my education - I still carry a licence allowing me to handle gas appliances in Ontario under 400,000 BTUH. While I received formal training on the use of tools, my ensuing social perspective on tools is probably a bit unique. It opens up a bigger discussion on the subject of "instrumentalism in data" or "data instrumentalism," as I am more likely to say. Instrumentalism is something that can both enable and disable. It can, for instance, help improve how a person shovels coal by focusing on key aspects of production at a particular level - that is, the most apparent. But these days, some might regard the use of a shovel to physically extract coal inherently unproductive. Heavy machinery can probably do the job better and faster. Data instrumentalism can cause an organization to generate metrics fitting its preconceptions of progress; this is irrespective of the underlying reality. In this blog, I will consider the possibility that instrumentalism has become pervasive because it is "structural" rather than merely ideological. The objectively verifiable constructs and technologies supporting a data system can bring about instrumentalism perhaps due to lack of awareness of adverse consequences; but it can also be the result of underdevelopment and poor design.
Tool Conversion Principle
I would argue that the basic idea behind a tool is as follows: a tool causes the diverse attributes of a person to be converted into specific actions intended to achieve particular outcomes from the tool. It is irrelevant, for example, if the person handling a screwdriver has no education or several university degrees. A wrench works if the user can apply the correct action: it hardly matters if the person is a great piano player, a firefighter, or mechanic. Of course, some people might be able to perform the required actions better than others. Similarly in relation to a workplace - given our ability to rapidly replace workers - it seems that the attributes of a person have become less important than the extent to which he or she can generate conforming behaviours. The organizational structure represents a sophisticated type of tool: we use it and become used by it. The more complex and integrated an organization must be to fulfill its design, the greater the extent to which employees must work within tight parameters. Everybody performs their duties in specific ways and often at a particular pace to achieve the intended outcomes. In this environment, we can say that people make use of tools; but also, the tools make use of people. The types of tools that people are given or are compelled to use to some extent dictate their behaviours.
Consider the basic attributes of tool conversion as it relates to data in the following: an agent is asked to perform specific tasks and only these tasks to obtain particular outcomes and no other outcomes; data is gathered but only the data pertaining to those specific tasks and outcomes. I found that a bit tedious to write, but I hope readers catch the gist. At no point am I saying that data-collection is unimportant. It is extremely important. However, the nature of data is shaped by the desired outcomes and objectives. A tool can best perform what it was designed to do. The highly constrained nature of data extends from its instrumental use. The user reduces the proxy such that it represents the most instrumental aspects of the underlying phenomena; his or her actions lead to contextually constrained data. The user as an agent of the environment exhibits conforming behaviours. Data becomes an instrument of conformance to accomplish the predefined tasks of an organization.
Somebody or something must "become" a tool in order to use a tool to achieve "specific" outcomes (extreme emphasis in quotes). There would still be outcomes if resources were not treated like tools, but these might not be those "specific" outcomes originally sought. In light of this principle, the size and complexity of an organization can lead to high levels of instrumentalism. As an organization grows in complexity - spatially, functionally, and structurally - there is increased risk of disagreement among and between the individual components. Thus, instrumentalism is about defining how things operate and imposing rigid controls; this can result in adherent functionality at the cost of autonomy. Many years ago, clocks and watches were amazing specimens of design containing many moving parts working together. Each part had a role to play in the overall process. We might conflate a part with the role that it plays. The role is so rigid that it could be played by nothing else but the intended part. Of course, any part could be changed. But not the role that it plays. So in relation to tools, it is necessary to bring about conformity in ourselves to make use of a tool as intended. The act of conforming is the result of ourselves becoming tools, that we might be used as intended. The data that we gather becomes part of this process of "projection" as I will explain further.
I will provide a conceptual explanation followed by a simulation. The general concept is as follows: as people perform a greater number of tasks together as parts of an organization, it is necessary to apply additional controls to ensure that these behaviours lead to the intended productive outcomes. Expressed a bit differently, as the demand increases for specific productive outcomes, people have less opportunity to exhibit autonomous behaviours; moreover, the more control they must give to the production environment to participate as a collective. Personal autonomy tends to decline as production demands increase.
I don't know how many readers have had the opportunity to work at an assembly or production line. I have never been opposed to doing this sort of work myself. It requires considerable self-control and intellectual peace - a willingness to accept the guidance of repetitive external forces. I used to load skids by taking boxes off a conveyor belt. A line travels at a particular pace. The inability of any person to maintain the pace quickly becomes apparent to other workers. The speed of the line regulates behaviours - or at least it necessitates certain types of behaviours. But the underlying control - for it determines the speed of the line - is the production target. Here then is how data-collection can control people. Does it seem like instrumentalism? A worker might be reduced to a few specific behaviours to achieve particular goals. I recall a coworker at the production line who was also a professional artist. He explained how he had successfully sold a number of his paintings at art exhibits. So when one of his hands fell deep between a couple of conveyor belt rollers - stripping off the skin - the accident meant more to him than simply being unable to pick up boxes. However, he did not appear on the data except in the metrics of production - its behavioral criteria and targets.
I was wondering how I might substantiate productive instrumentalism given lack of organizational data. I decided to provide a rather mathematical explanation. Below are some plumes produced by "nomads": a nomad is a sprite designed to walk across a graphics array in both images using random y-axis shifts. For the plume on the left, the nomads walked from left to right, creating a relatively random-looking distribution. In the case of the right plume, the nomads started from the right at a particular point or area, and I programmed them to start walking left. Another way to look at the right image is as follows: imagine many of workers starting at the left demonstrating all sorts of behaviours in order to achieve specific production outcomes on the right.
I realize the simulation is rather simplistic. Perhaps some readers might accept my simple assertion at face value: people naturally exhibit all sorts of behaviours, but systemic controls must be in place for them to achieve "specific" outcomes. A specific outcome is an outcome that cannot reliably happen by chance. The more complex the required outcome - for example to produce an automobile - the more risks associated with leaving behaviours to chance. Random behaviours are unlikely to lead to deliberate outcomes. It can still happen perhaps by accident, of course. However, productive complexity necessitates sophisticated and rigid operational controls. I therefore argue that production is inherently instrumental. As companies grow in size, make use of computers, and engage their competitors, they have to incorporate into their structural capital many processes for the sake of efficiency. This gives rise to an environment where workers have little personal autonomy. Similarly, in a society handling all sorts of complex and competing interests, there would tend to be more laws and enforcement of laws: for neither laws nor enforcement would exist if similar outcomes could be achieved through happenstance and autonomy.
I ran the nomad simulation a number of times allowing for increasingly more spread at the right indicating greater personal autonomy. Then I compared the level of similarity between the convergent patterns and their non-converging counterparts. "Compliance" indicated below is the percentage similarity. The chart basically shows that as more autonomy is allowed, the level of similarity between the convergent and non-converging plumes increases. Worded differently, if we had to achieve a highly complex convergent pattern, compliance would worsen through reduced autonomy; this gives rise to the need for greater control. If we want people to follow a detailed regiment of behaviours, and assuming production is dependent on adherence, then it is necessary to reduce autonomy.
I know that it seems at least on the illustration that greater compliance is achieved through increased autonomy. This is a bit like saying that everybody is innocent if we have no laws; so in order to reduce crime we should have fewer laws. (I added this just to confuse readers, really.) However, both I and the illustration are saying that as people lose autonomy, they are inclined to break more rules unless there are effective controls. Autonomy is related to the amount of spread at the right-hand side of the plume: a tighter spread is associated with less autonomy. If we want people to achieve specific goals, the spread must tighten. Since this causes compliance to decline, assuming people behave like these nomadic sprites, it would be necessary to apply more control. I'm glad that I can reflect on the code in these situations. By the way, for those interested in the code, a portion of it is attached to the blog (Complexity.java). Since a random generator is used, the plumes that others might produce will differ. Again, I hope that people take my general contention at face value since the next part of the blog builds upon it.
The main part of this blog pertains to instrumentalism dealing with data perhaps on a more abstract or symbolic level. The idea of "structural capital" relates to how people within an organization can change (through recruitment and attrition); and yet the operations continue as before. There are forces in the background that bring about conformance behaviours. Many of us are familiar with the experience of being trained to perform particular duties. The trainer was trained by others. Yet the head office of an organization located in another state or province does not witness any of these activities. A manager might not be aware of all the day-to-day events taking place in the office. The process of maintaining the operating structure of the organization is based on flows of data. A manager certainly can't recall every detail of every event: this person would refer to records and particularly statistics.
I don't have any managerial responsibilities myself. But I collect lots of data. I am always surprised by how my personal opinions can sometimes conflict with the data that I collect. This has led me to conclude that nobody should rely on gut instincts if real data can be collected. At the same time, I realize that the data gathered and what the data says are determined by design; this is the case in relation to conventional prescriptive metrics. I believe that at some point, it should be unlawful for adverse decisions regarding employees to be made on the absence of reliable data. Given the persistence of data and its relevance to the lives of people, stakeholders should be able to review and question the facts leading up to decisions in the determination of compensation and on matters of employment history. I will cover this on some other blog.
When data is collected, not just any data will do. "Josh parked a foot from the fence," is certainly data although it doesn't seem all that business related. Criteria will be set to determine whether or not particular events represent conforming data. Therefore, the "number of mistakes made" will tend to receive more attention than "brand of jam" in Josh's lunch. I am not writing about any Josh that I might be working with, by the way. The jam and mistakes are both present in the workplace. The choice of criteria determines the ontological relevance of events, giving rise to data mostly about mistakes rather than jam. (Josh of course might be slightly allergic to the jam causing him to make more mistakes.) This is the general concept of the "metrics of criteria": data is emerges from the application of criteria - but mostly certain types of data. The data indicates compliance: it can be used to ensure conformity and identify apparent deviations. I have described this process as "projection," a type of disablement. I think that many people would also recognize how this is also a process of "alienation" leading to distance between data and its underlying phenomena. An organization brings into its substantive discourse only certain aspects of the environment.
I find it tempting to approach the point of instrumental data from a purely polemic standpoint. However, in this blog my focus is really on the structural contributors and determinants. I said that in an organization, workers change all the time, but the structural capital ensures the continuation of productive behaviours. I also said that the process of conformance and control is driven by the metrics of criteria. The suggestion that projection can occur "structurally" implies that it should be possible to find persistent articles or artifacts in the production setting giving rise to compliance.
Classification of Data: The meaning of data in a particular context tends to be constrained. I believe that many people would regard the term "pigeon-holing" negatively. It just doesn't seem like a pleasant thing to do to anyone or with anything. Nonetheless, the idea of having fixed slots to hold predefined data types seems pervasive. Some organizations store their data on relational databases where pigeon-holing is common. For instance, a database might contain slots to hold sales data. If we discover over the course of routine data-collection that some clients were "troubled" or "anxious," depending on the structure of the database at the time these facts might have to be discarded. Therefore in certain database environments, the failure to account for everything during design can contribute to exclusion later. This seems ironic given that the whole point of gathering data is to learn about matters not already known. I recall once handling calls from customers about their appliances. I was required to select the "nature of the call" from a lengthy drop-down menu. I consider the approach rather ineffective: it can contribute to losses and liability. (Do I have any data-oriented solutions in mind? Yes, although I won't introduce it on this blog on data instrumentalism. I'll save it for a future blog.)
Amount of Data: The amount of data tends to be limited. By "amount," I mean the aspects (i.e. columns) rather than number of events (i.e. rows). This is a bit like the idea of slots (limited room for expression) except that the structural constraint is more fundamental (inadequate room for expansion). At some point, it might be necessary to add information about something not anticipated during the design of the database; or to add to a certain aspect of something anticipated but never fully rationalized; or where the circumstances have since changed. For instance, a database could be designed to record information about bedding replacement: e.g. "Changed bedding? Please check yes or no." However, a mitigation program might require that bedding be periodically tested and specially treated; this creates a need to hold data beyond what the system was originally designed to handle. I'm not saying that it's impossible to get around the problem perhaps using bulk migration or by linking references to other files. I merely underline the overhead involved in accommodating new data. After expanding the database, there might be compatibility issues.
Contextual Relevance of Data: Data generally lacks contextual information. The data held by an accounting department is easy to distinguish from the data held by shipping. The "separation" of organizational functions tends to maintain a contextual separation of the data. However, if an organization tried to preserve and make active use of contexts, this would open up the deep issue of what it means for data to be contextualized. This brings me to my colourful saying that data is often "headless"; it can be found in this horrific state, floating or wandering aimlessly in the data system. I believe that a sophisticated data system should be designed to hold hundreds or even thousands of different contexts associated with particular events. In doing so, an event that seems entirely shipping related can have multifarious organizational impacts in the form of contextual outcomes. If there is a single overriding context, this indicates instrumentalism. The data would tend to convey only those details as per the details expected to be conveyed. I believe that this isn't a purely philosophical or ideological obstacle, but it is also structural.
Instrumental Criteria: On the surface, it might not seem that "sales" is the product of criteria. The act of purchasing a product could generate an enormous number of data events: e.g. desire to do well in school; need to blend in with other students; anticipated physical changes to the human body. (I am of course referring to back-to-school products.) Yet retailers study something of questionable predictive relevance: the revenues collected during sale. It must seem like the collection of revenue is the sale. I am not saying that this measurement is "completely" pointless. It is useful for financial guidance. However, customers might decide to stop buying products the following year; it then becomes apparent how superfluous the data is. It just so happens, the retailer has established certain criteria, and sales figures are generated through the application of these criteria. "I want to know how much money I collected after I sold this to the customer." "I want to know how much money was made during this time period." I consider it healthy for data scientists to examine data in more critical terms. If the data is primarily criteria-driven, it becomes necessary to question the criteria itself. For me, the problem isn't so much that there were faults with the original criteria. Over time, all criteria can become faulty since the environment is unlikely to remain static. The real problem is the inability to adapt criteria to the changing realities.
Poor Conceptualization: Before criteria is set, or perhaps even as an alternative to formal criteria, it is necessary to determine what events "might be" important. I covered this point in a previous blog on the geography of data dealing with transpositionalism. Take for example the idea of following a particular course of treatment. An interesting question might be as follows: is this medication good to take all spring, summer, fall, and winter? I'm not actually suggesting that seasonality is necessarily important; this is merely the sort of question one might ask. The change in seasons is a major event for living organisms occupying different climates; for humans, this influences indoor environmental conditions and the availability of indigenous varieties of food. Is it "good" for a person to take the exact same treatment if he or she is a librarian or a firefighter? Of course, if we refer to the product label, it probably doesn't distinguish between different seasons or occupations; in fact, it probably doesn't alter the recommended dosage even based on weight. Peanut butter seems to be fine for many people to eat; but it can kill a certain segment of the population. Having prescribed normatives of a person creates normatives of solutions and preconceptions pertaining to that individual. But not everybody fits the norm. Similarly, an organization should gather data and make use of data from the reality that it confronts rather than discussing realities faced by other organizations.
Scientific Method: Data can become be instrumental if its collection and recognition (as something materially substantive) is designed to satisfy a methodological normative. My point is therefore not necessarily limited to the Scientific Method. Within the context of this method, there is experimentation, deliberation, debate, reexamination, revision, and often restatement. It is a highly rational process driven by many big wheels. It can take some time to get anything done. If we inject massive amounts of data into this process, the defining nature of the methodology becomes more apparent. Researchers often want to substantiate specific points - so they need clean and therefore highly controlled data. Indeed, researchers want the data to be convincing. In the process of trying to extract a sliver of light from a big reality, data as a proxy can become both pristine and alienated. Noise is screened out. This is not to say that the noise is unimportant, but rather its relevance has been dismissed. So people with an incomplete understanding of the phenomena screen out or never collect data premised apparently on authoritative understanding. Researchers make their point. The wheel turns a little bit. The process repeats. In an environment with many competitors and shrinking markets, one might question the feasibility of such an approach. An organization can go bankrupt long before its environment can be adequately examined for actionable insights.
Insulation of Structural Capital: The previous points dealt with the structures giving rise to instrumental data. For any number of reasons, the policies, practices, and philosophies of an organization might steer the data in particular directions perhaps as a matter of strategic choice in order to accomplish particular outcomes. This might be described as the structural manifestations of ideology. There isn't really a shortage of examples, but I can share one that immediately comes to mind. There is a popular human rights case involving the Toronto Transit Commission and a visually-impaired customer. This customer wanted the TTC to announce all stops. I have been on the TTC during packed hours when the visibility was terrible even for people with no visual impairment. It is certainly a curious business perspective to consider signals and stop-announcements a problem rather than a selling point for a public transit service. The TTC lost in this human rights case. Perhaps the TTC received all sorts of customer feedback to encourage improvement in those days. (Read here "client data.") If the TTC had so much time and money fighting a visually impaired patron in court, one might be reasonably led to question how they addressed "customer feedback" more generally. I haven't used the TTC in many years, and the court cases occurred some time ago; so I can't say if conditions are the same. However, to me this is an example of structural insulation. One would expect a bridge or path, but instead there is a wall.
Instrumental vs. Participatory Data
I recall a science teacher once explaining why cells have limited size: he offered a rather mathematical explanation. There are some structural impediments given that nutrients are taken from the outside: the amount of nutritional intake depends on the surface area, which starts to decline in relation to mass as a cell increases in size. Consequently at some point, geometry prevents the cell from supplying itself with adequate nutrition. While I am not talking about cell dynamics in relation to organizations, I believe that math likewise plays an important role. I suggested earlier, as an organization gets larger, if it hopes to carry out specific tasks involving many parts, it must exert more control over those parts. I said that this is probably relevant in relation to any data collected under a prescriptive regime. I also discussed how this control might structurally manifest itself within an organizational context specifically as it relates to data. I believe that my list is fairly short; there are perhaps many more structural manifestations of data instrumentalism. To me, the crux of the problem is how an organization might be unable to correct itself once it loses direction. Given the insulation of structural capital from the environment, organizational decline seems almost inevitable. I believe this is one of the reasons why organizations sometimes ironically "parachute" outside consultants - presumably to advise internal experts on how to run their own operations better.
An outside consultant might bring to an organization a fresh set of eyes and possibly a perspective more open to different possibilities. However, if the data itself is instrumental, the likelihood of an outside consultant reaching radically different conclusions seems remote. The data is designed to provide guidance. The structure made the data what it is; so the data will tend to say what the structure intended for it to say. As an escape, the data must instead articulate the environment - both the one that an organization occupies and the other that exists within the organization itself. In many discussions surrounding big data, there seems to be a persistent question of what should to include. One reference point is the market - the external environment. But this can't really be the focus. A car manufacture might study the demand for home repair products and yet be unable to expand into this area. The real question is what the organization can reasonably deliver in light of its current circumstances. This requires input from internal systems - the internal environment - as it relates to different potential markets. This problem is solved not just by any data - for instrumental data cannot help - but rather by the metrics of phenomena gathered through environmental articulation. This ensures that any data gathered can help an organization adapt. I call this alternate type of data that comes from the environment "participatory data." This is a different kind of data: more massive, complex, and organizationally engaging than its instrumental counterpart.
Upcoming Blog - Dark Art of Warping
In a couple of weeks, I will be assigning some real-life personal health events to a proxy - the closing prices for the Toronto Stock Exchange - using a technique that I call warping. Why in the world would anyone want to warp something? Data becomes surreal and freakish in the dark art. I still recall some sage advice offered during a movie: a Jedi must learn to build his or her own light-saber. My next blog will be about building weapons from data. Well, not really. But it's as close as I get to the topic.