The role of statistics in data science is often debated. Despite rapid developments in technology giving access to algorithmically sophisticated approaches, I feel that statistics can still provide many worthwhile insights. If I have a database of sales figures spanning many years, I feel that I can become more aware of historic trends and seasonal patterns through the use of statistics. Statistics offers a sense of state, direction, pace, and progress. Statistics can also enable estimation of future levels. What statistics doesn't do in my opinion is address questions like the following: "How did we get here?" "How do we get ourselves elsewhere?" In this blog, I will be arguing that statistics should not be expected to guide intervention since the "object of intervention" (the thing receiving the attention) hardly "participates" in statistical data. I define participation in data as the extent to which underlying phenomena are exposed to the risks and benefits of intervention directed at the data. These dynamics will become clearer after some explanation, I hope. We should never dismiss the importance of studying state and comparing changes over time. However, making a comparison to describe how something developed is quite different from explaining the reasons why those developments occurred. The academic performance of students can be determined using their grades. This tells us nothing about the situations confronting students that affect their performance. If the objective of an educational system were to improve grades, it would be necessary to gather much more data than just exam and essay scores. Grades reflect how a student interacts with external criteria. The data is not an intrinsic part of the person. If participation exists, its parameters are set by the users of the data rather than those being evaluated.
I believe that most organizations recognize the need to incorporate as much data as possible into decision-making. Whether or not these decisions lead up to the intended outcomes is a separate issue. It is possible for managers to make decisions without using any data or even giving their circumstances much thought. Financial commentators periodically mention how random selection can sometimes outperform professional investment managers. The fact that people make use of data does not necessarily mean that the data will be used effectively. It is necessary to distinguish between how managers respond to their data and the outcomes of intervention. There is a need to confirm whether decisions truly contribute positively to organizations. I would argue that the efficacy of decisions depends on the level of participation of the underlying phenomena in data. In relation to political discourse when discussing "participation," public activism and community involvement probably come to mind. In my own research, I found that people can be involved in institutionally mandated processes without necessarily having much real impact. They can be counted and at the same time ignored. Participation need not be effective.
Distance of Object from Data
This brings me to my observation that people can be part of a process and yet not actually participate. A company can maintain sales statistics and yet not have the foggiest idea why people purchase their products; for the statistics say nothing about the reasons. I can count a basket of apples and in the end have or gain little understanding of this fruit; for the purpose of counting is not to understand but only to confirm the number. Although the underlying phenomena can participate in data, this participation doesn't necessarily have to be significant. A metric that is extremely prevalent in Western society is cost. I guess because there is so much money to take into account, almost every situation can be reduced to its cost relevance. It is an alienation of things to characterize existence purely in relation to costs. It can be said that in a data-rich environment that is highly cost-oriented - e.g. in an accounting department - the participation of the deeper phenomena might be minimal. Many workers appear at their workplaces every day as confirmed by their attendance history although their employers might have little understanding of them as individuals. In the image below, I formally separate the object phenomena from its data proxy or its representation in data. Data is a proxy for phenomena. I describe the conceptual overlap as participation.
Figure 1 - Data-Object
Data is merely a representation of the object and not the object itself. This is an important distinction. It is possible to have a lot of worthless information; consequently, an organization might improperly value its intellectual capital and therefore misjudge the likely return on investment. I believe that objects participate in data in different contexts and at many levels. Demographic data collected during the census contain little in terms of object participation. The use of demographic data is defined externally. Those that participate in the census have minimal say over how the information collected should be interpreted and applied to decision-making. Participation of the object in data is important from the standpoint of providing services and to support intervention. If an object is quite far away from its representation in data, there is a limit to how much the needs of the object can be addressed through the use of the data - for instance, for allocating resources. These dynamics are not exclusive to service providers. In a discussion of performance evaluation, scorecards and rankings can be deployed that are quite sophisticated and yet a great distance from the objects being examined. Marx wrote about a form of alienation called reification. It is a routine occurrence in our society for metrics to be elevated irrespective of the distance from underlying phenomena; perhaps in many respects, reification has become structuralized and normalized through our theocratic faith in technology and neoliberal preconceptions of development. Having big data is not the same as having great solutions. Tools don't make a craftsman. A craftsman makes things using tools.
Distance of Data from Subject
I describe the user of data as the subject. This is the person or organization that puts forth the need for data and therefore criteria to enable fulfillment. In data, the interests of both subject and object come into play; it is important to recognize this pervasive and fundamental power struggle at the atomic level of our social reality. Earlier, I said that the object doesn't necessarily have to participate much in data. But actually, the object doesn't have to participate at all. The recognition of data is often determined by the subject. While it hardly seems business-like to do so, those attending a board meeting can propose a course of action lacking substantiating data; this can occur if there are pressing demands for a response. The absence of information does not negate the need to render decisions. Decisions can be made through the use of presumptions, assertions, assumptions, philosophies and ideals, entrenched patterns of reasoning and behaviour, and of course gut instincts. The subject decides how far away the data can be - the distance of the data from the socially constructed forces that define and impose meaning irrespective of underlying phenomena. Given the context of discourse surrounding public participation, I decided to use the term "regulation" to describe its subjective counterpart, as shown on the image below. Therefore, in data there can be participation of the object; and there can also be participation of the subject. I describe the latter as regulation. I think that it is fair to say, when we talk about data, the object-subject distances don't normally enter the picture. I believe this is because data is frequently not interpreted in critical or complex terms.
Figure 2 - Subject-Data-Object
Structural Expansion and Data Distances
The duality of capitalism is sometimes discussed in environmental philosophy. On one hand, it is possible to gain benefits from capitalism; on the other, there can be environmental destruction and harm to people. I know these polemics might make me seem rather radical. I'm actually big on free enterprise. Consider duality as it relates to companies selling products that people don't want to buy; managers making decisions disassociated from the underlying problems they hope to tackle; and marketing departments collecting large amounts of superfluous information. Little of the surrounding reality might participate in the data being collected. The data merely becomes part of the illusion. If we assume a type of expanding recursive behaviour in data, hazards exists in the expansion of organizations as distances increase in the subject-data-object construct. Structural expansion might bring about a need to increase control patterns - philosophies, policies, practices, protocols, and regulations - that lead to an imbalance. So great has been our tendency to be controlled, the participation of objects in data has not kept pace. In my blog post on Transpositional Geography, I explain how the relevance of data to different contexts can be used to map out non-spatial events. I would say that over time through systematization - literally through the expansion of control systems - the transpositional plane has leaned in favour of regulation by the subject; against participation by the object; insulating capital from its environment.
Figure 3 - Expansion and Transposition
Also in an earlier blog, I describe three types of data flows in an organization: projection, direction, and articulation. These data flows, indicated in the illustration above, provide analogs to explain the subject-data-object construct in systemic terms: regulation expands to become projection; participation to become articulation; while the body of data becomes the multifarious events related to processes of production. On an organizational level, I believe that true adaptation is becoming increasingly difficult to achieve due to declining levels of participation in the data. It is quite challenging to adapt while being insulated from the environment within the organization (through lack of internal articulation) and outside it (through lack of external articulation). Handling the subject-data-object construct can affect the functionality of large structures and systems. I hope this perspective provides useful insights in terms of the pathological condition of structural insulation: e.g. how phenomena can be substantively alienated from data systems thereby making successful intervention less likely.
Before proceeding with some case examples, I just want to emphasize the rather routine nature of non-participatory data. The fact that an organization has a plan and closely follows it does not mean that things will improve. Governments by nature deliver institutional responses to problems; this is not necessarily related to problem mitigation or resolution. For instance, it might make perfect sense from a bureaucratic standpoint to promote improved insulation in homes in order to save energy. In the past this has led to class-action lawsuits: the use of urea-formaldehyde comes to mind along with a few other incarnations of energy-saving insulation. Companies sometimes find it necessary to implement restructuring. I ask readers to reflect on how often, how well, and for how long restructured companies actually survive. We need to distinguish between the plan or course of action and the actual outcomes - the proxy and the phenomena. Although intervention might indeed cause data to behave as expected, this does not mean that the reality will do likewise if the level of participation between reality and the data is quite poor. Nobody has ever really questioned the ability of organizations to carry out plans. But whether the successful implementation of such plans is good for society or the organizations themselves is a separate issue. Companies seeking the highest profits possible have suffered against foreign competition: for in a war of minimalism, somebody else can often produce the same products far faster and better. Thus, many organizations have deliberately adopted plans likely to lead to their market termination. So if companies fail, it is not necessarily because their plans fail but rather as a consequence of total success; it just so happens, quasi-intellectualism is often a poor substitute for effective intervention.
A Year of Intervention
My undergraduate thesis dealt with the effectiveness of public participation in local planning; this is quite similar to the topic of this blog. Although my research was specifically about community participation in municipal planning, I feel that subject-data-object dynamics inspired by my undergraduate research are broadly applicable to other areas of concern. Keeping this portability in mind, I would now like to offer some real-life examples from my own data. I have been arguing that effective intervention is quite difficult if there is lack of participation in data. This is a good principle, but there are also practical reasons to regard data more inclusively. For instance, if a person is unaware of what data contributed to the metrics, it might be unclear how to intervene in order to influence the metrics. Another rather routine problem involves the question of how to respond if in fact the data collected seems irrelevant to the metrics: that is to say, intervention was attempted, but the data used in the intervention did not lead to the desired outcomes. One has to determine if the intervention was faulty, or perhaps the data chosen reflects the underlying phenomena poorly. About a year ago, I started collecting a large amount of personal health data. Rather early in the development of the alpha, I decided that I would eventually use the data to support intervention. So I needed the system to provide specific insights. It is inadequate to have charts showing trends - i.e. statistical charts - due to their lack of tangible guidance.
When I originally thought about "improving" my health (through algorithmic intervention), the mission seemed simple. I had to gather data, and then from this data I would respond accordingly. This was the plan - and it made sense at the time. But how does a person respond accordingly to unfamiliar data? Not so much due to effectiveness but rather ease of intervention, I initially experimented with my choice of nutritional supplements. During much of the trial year, I had minimal understanding of intervention. I discovered over the course of these experiments that (1) a person can accidentally discover useful things and be more aware of their usefulness simply by collecting and frequently studying the data. It isn't always necessary to have an elaborate plan of action. I also noticed how (2) the underlying phenomena might not fall fully under the scope of the data being examined. Therefore there is a constant search for different types of data to characterize the phenomena. I then reasoned that (3) the data collected involves my interaction with the phenomena; therefore, the resulting metrics reflect not so much the phenomena itself but its relationship or response to something else including but not limited to my intervention. This has given rise to my assertion that the data represents a "stress response." The algorithm that I use to determine the relevance between the data and metrics is premised on stress dynamics. (During my graduate studies, I examined how structural transformations contribute to stress. I might write more about this in the future.)
The image that follows shows my "sleep perceptions": the gradient reflects a combination of factors affecting sleep including heart regularity and depth of restfulness at steady-state conditions. There is a trend line on the graph indicating gradual improvement despite rather choppy day-to-day fluctuations. The data giving rise to the gradient levels represents not sleep itself but rather my perceptions of it. I monitor the relevance of many hundreds of data events in relation to the gradient. But the events are only useful to the extent that they proxy or allow for the participation of the phenomena of sleep. Participation is much less rigid than the idea of causality: to understand a person, it is not necessary to know the cause; to be part of an ecological system is to exist as a person and not as the outcome of something else. Figure 4 confirms that there has been some improvement in sleep perceptions through intervention - much of it recent, after I gained a better understanding of how to intervene. I use a linear trend line although polynomials would certainly contribute to different perspectives. While it is arguable whether or not a person should deliberately attempt to intervene to achieve improved sleep perceptions, I had no problems halting the consumption of things that seemed to interfere with sleep. For instance a popular remedy said to be worthwhile for arthritis and another for prostate functionality seemed to have adverse consequences on my sleep perceptions. (I want to emphasize that the perceptions are mine only. Others should not infer guidance since none is intended.)
Between day 46 and 76 on the x-axis, there is a spike suggesting really superb sleep perceptions. I haven't been able to reach such heights for a long time. But because the system is so effective in its ability to give details, I believe there is one particular supplement that I was taking just at that time. It is a supplement that is being examined for its contribution to cardio-vascular health. Sadly, my supply actually ran out on day 76. I adapted to the situation, of course. In order to achieve gains without the supplement, I turned to particular types of rhythmic exercises. This is not to say that all forms of exercise help me sleep. Only specific types seem to provide improvement. Fortunately a few weeks ago, I found the supplement once again at a local grocery store. The nature of this supplement indicates that dietary changes might have positive consequences if I were inclined to eat a lot of leafy green vegetables. A problem that has emerged in relation algorithmic intervention is lack of rational explanation: I might get both the question and answer without necessarily understanding the underlying problem. Intervention then becomes an exercise in risk management: I justify my response from the indicators available without completely understanding the situation. For instance in relation to treatment specifics such as dosages, I cannot internalize the logic a priori. I rely on algorithmic tools that are more responsive to the surrounding circumstances than me. Nonetheless, it goes without saying, forming reasons to try to explain reality has become a preoccupation. I have found that lower internal body temperature has been associated with improved sleep perceptions; it is all extremely interesting.
Although I can tolerate lack of sleep it sometimes seems indefinitely, the same cannot be said in relation to poor breathing. Breathing problems justify immediate and aggressive intervention. I have reason to believe that for me breathing perceptions are sometimes related to environmental conditions. One day almost by chance, I decided to view the textiles from my blankets under a microscope. I found that some of my bedding materials were full of loose fabrics. I felt that these fabrics could be easily inhaled during sleep. So now I frequently change bedding. I choose fabrics that are unlikely to loosen. I have also found certain supplements that for me seem to improve breathing - one specifically that my family doctors prescribed many years ago albeit not for breathing. I normally just visit the vitamin counter rather than a pharmacist for my supply. I find that the supplement helps with breathing on the day it is taken without cumulative benefit. I maintain two different contexts for breathing: one for my morning breathing perceptions (Fig. 5); and the other for the evening (Fig. 6). Similar improvements are indicated on both illustrations.
Figure 5 - Beginning-of-Day Breathing Perceptions
I once continued having breathing problems many months after recovering from pneumonia sometime in the mid-90s. (By the way, we can blame my ability to program in several contemporary computer languages on these breathing problems. The coughing made it quite difficult to maintain a regular job. One of the few things I could still do effectively was program.) I visited a specialist for advice. He said after examining my x-rays, "I don't know if anyone has ever mentioned this to you, but you have chronic sinusitis." It only affected one of my nasal passages. I didn't bother looking up the meaning of this medical term for the blog. However, I can plainly say that my nose has tended to be stuffed up. I haven't deliberately tried any kind of intervention focused on nasal performance. But it is reasonable to expect some mitigation as a result of recent efforts to improve my breathing. I started collecting data to get a better sense of the situation. The gradient for nasal perceptions is also linked to sound - that is, the sounds that I hear while attempting to breathe hard through my nose. The most recent chart including its trend line is shown below (Fig. 7). My sense of smell has been much better recently - better than I ever remember it being. It has been like gaining a completely kind of nose, which I must say has been rather mood altering.
Supporting Effective Intervention
We sometimes hear stories about people following their GPS devices regardless of what they see in front of them. An organization might create a plan and follow it regardless of the ensuing events - as if the point of the exercise were to follow a plan rather than achieve improvement. In such scenarios, the surrounding reality cannot be said to participate in decision-making. Not all forms of participation are configured in a manner that resembles democratic involvement. Whether or not people vote or assert themselves during public meetings, their interests can be taken into account. This is the essence of participation. Data is much more useful when it is connected to the lives and lived experiences of people. Patients in a hospital can participate in decision-making by having their needs considered; they don't have to be involved in the day-to-day administration of hospital services. Similarly, it is possible for emerging consumer needs to be integrated into product development. Participation does not necessarily manifest itself as democratic representation. Conversely, even if people are involved by their physical presence in participatory processes, this does not necessarily mean that their interests are taken into account.
So I have offered some real-life examples of intervention. I actually had many more, but I deleted some material to control the size of the blog. I don't know how many people would be fascinated by my internal body temperature; if I can turn things to ice through touch, then that's sort of interesting. Unlike algorithmic processes leading to intervention decisions, the outcomes of intervention can be examined using relatively traditional methods: e.g. technical analysis including moving averages; trends patterns for example using linear regression; and in relation to subjective targets established strategically. Those accustomed to examining metrics of performance through statistics can continue doing so I think rather effectively. A statistical evaluation of superficial metrics remains perfectly reasonable. I have simply argued in this blog, traditional strategies don't necessarily support intervention particularly well. Overall, algorithmically-inspired intervention can have beneficial impacts as confirmed by the preceding images. Intervention depends not just on the ability to detect a problem but also confirm the effectiveness of mitigation; the latter is only made possible if there is participation in the data. The frontier of intervention involves a type of math focused on the relevance of data to its underlying phenomena (the object) rather than the user (the subject). In contrast, statistical evaluation has mostly been about conforming to external needs of the user. Statistics can tell us the likelihood of a newborn baby becoming a lawyer or doctor - which is useful if we would like to know the likelihood - since the information is otherwise quite pointless from the standpoint of intervention.