The celebrity of Toronto's mayor has certainly drawn a lot of attention to the city in recent years. Several candidates are now running for Rob Ford's job. Since the mayor is currently undergoing treatment for cancer, he decided to withhold his candidacy in the upcoming municipal election. Being a longtime resident of Toronto, and being aware of the city's wealth and poverty, I'm always interested in how these competing needs play out when it is time to vote. Consider the interesting dynamics of elections on a more broader level: people carrying different needs, desires, and expectations assert their presence in society by placing a tick market on a ballot. The ballots are then tabulated to determine the distribution of power. An alien species from another galaxy, perhaps with limited knowledge of humans, might try to gather insights from society by examining the election results. It would be quite difficult to explain to these intergalactic creatures the many dimensions behind the casting of ballots. How can a person possibly hope to deal with problems such as social inequity, homelessness, and poverty by placing tick marks on a slip of paper? Such metrics are quite detached from the lived realities of people. Elections serve as a tool to support the existing power structure and assert authority. Their effectiveness in terms of dealing with problems is debatable. In this blog, I hope to clearly differentiate between the idea of a data proxy (i.e. metrics) and the underlying data events (e.g. lived realities that people face). I will then discuss how data can be weaponization through the deliberate and systematic use of proxies. While it might seem that proxy-phenomena differentiation and data weaponization are two separate topics, the latter is actually made possible by the former as I will explain in some detail.
Mr. Howard, a former math teacher of mine, once said that the decision to use symbols to represent quantities was an important development in human civilization. He was referring to the use of "2" to indicate two items; "3" for three items; and "4" for four items. When we say that we have 8 items, we really mean that we assert a value of 8. In real life, it isn't unusual for our asserted value to differ from reality. A bag holding 8 grapefruit might actually contain only 7. For me, I sometimes find bad grapefruit included in the bag; and although the package might not distinguish between fresh and rotten grapefruit, obviously I feel short-changed if the bag contains bad fruit. In any event, though this symbolic abstraction, I can boldly write that XYZ Car Dealership has 10,000 cars - just hypothetically. The use of symbols helps to facilitate our detachment from reality. We accept that "8" represents 8 items for the sake of argument; we assign 8 to the symbol 8 although it could have been assigned to W or Q. If the closing price of a stock is $14.50, what does this represent? There was a barking dog; crying child; transit delay; lady misplacing her apartment keys; maybe a snowstorm. What are these things? They possibly contributed to the $14.50. It isn't an unreasonable assertion. In real life, $69.95, the amount I paid recently for a vacuum cleaner, includes the cost of raw materials, administration, storage, utilities, shipping, financing, marketing, and so forth. I am saying that a metric is often something simple representing a much higher level of complexity.
In prior blogs, I referred to data proxies as the "head" of data. I logically described data events as the "body" of data. The body is much larger than the head, in case it isn't apparent on the illustration above. My objective in the image is simply to loosen the bonds between body and head; part of this is to facilitate later weaponization where decapitation will occur. We participate in data decapitation all the time. I recall not that long ago, there were several deadly factory fires in the garment industry. We import cheap clothes rather detached from the enormous social context: uneducated women and children working long hours for little compensation and apparently minimal health and safety protection. There is a vast amount of data concealed behind the proxy. Are these people and their interests irrelevant? When a tanker full of oil starts leaking crude over pristine waters, we often measure the positive economic benefits apparently oblivious to the ecological ramifications. This is due to the proxy-phenomena disassociation. I'm not the first person to mention this pervasive socio-economic alienation. However, I am probably more unique in terms of my focus on data alienation.
If we had a metric such as consumer sentiment, clearly this quantitative expression is based on a substantial body of data. It is necessary to differentiate between the proxy and its events. The proxy is the thing that we would like to influence. So any attempt to improve consumer sentiment would involve addressing the individual components of the index. The fact that consumer sentiment might be increasing is actually pointless without knowing the connection to its body of events. Sometimes, the need might not be to influence the proxy but simply monitor it or improve our understanding. For instance, another type of proxy is a geographic area having specific boundaries. The proxy isn't merely a means to convey a simplified assessment of the underlying phenomena occurring within the spatial parameters; but it helps us to divide and arrange the reality in a cohesive manner. We might conflate the outcome (the assessment) with the management of data (the division of facts intended to further our understanding). When we say that the value of something is 8, this generally means that 8 is the value that we would like to assert. But we can also say, for instance, I have 2 terabytes of data to associate with 8. It is possible to infer value over data assignments.
There has been a tendency to try to tackle problems by studying metrics; but the lack of body can muddle remedial efforts. For instance, studying the number of workplace accidents doesn't actually offer any insights in terms of reduction or correction. Yet we often become aware of problems through their metrics. I think that this can sometimes lead to the illusion that the problem is in the metrics. Thus, a government strategy to create jobs might be to pay one person to a dig hole and another to fill it. Place these people on a deserted island, and the pointlessness of their activities should become apparent since they would both starve. Digging a hole merely for the next person to fill is unproductive. The embodiment principle is that data generally exists as head and body, and these parts can sometimes become detached. Conversely, the data is normally in the form of either head or body, and these parts can sometimes be attached. My emphasis really is that the parts don't necessarily exist connected. Embodiment involves structurally associating events with metrics. This might be described as embodiment or capitation.
Assignment and Replacement
Since I tend to use the same proxies repeatedly, my personal database maintains a "proxy distribution file": this file currently contains about 60 proxies. Events are distributed in bulk to all of the proxies at least each day. There is a master proxy called a "main" that ensures the event-data has a place to stay even in the absence of other proxies. During those instances where I might want to make use of a foreign proxy, my system can generate a list of all valid batch dates to match against the foreign proxy. There is then a process called "digestion" where the system distributes event-data from the master to the foreign proxy. In this manner, it is possible to associate massive amounts of event-data for example pertaining to a public transit system to a proxy such as fares purchased. I would say the most sensible application of this technology is to "scout" the surroundings. This is a peaceful path for those that wish to use data to understand the world around them.
The coding behind assignment is straightforward. In fact, there is almost no coding. It is more an issue of ontological reasoning than coding. We wouldn't question the rationale of having a text file named "A" containing "8" thereby resulting in "A = 8." On the other hand, depositing the King James Bible in A might be a bit confusing. Placing the bible in a file named 8 probably makes no sense at all. If we distinguish the contextual evaluation of 8 from its hierarchical functionality, we discover that the contextual meaning is actually an assertion of proxy; but the hierarchal function supports the containment of events. Our data is 8: the data can carry both intended and actual meanings. The question really is the extent to which the items contained in 8 (articulating truth and reality) are relevant to the instrumental framework giving rise to 8 (projecting perceptions).
I will soon be discussing the replacement of the instrumental aspect of data (its proxy) rather than the underlying events. I want to point out that proxy replacement is something that can be done in order to determine which proxies best fit and explain the phenomena. I call this the metrics of phenomena. As the phenomena changes so too can its metrics. Although we might not fully comprehend the greater reality, we nonetheless can form a relationship that takes it into account. On the other hand, proxy replacement can merely define the reality in which we wish to perceive the phenomena: for example, I would say that this "narrowing" frequently occurs in relation to performance evaluation. "Performance" cannot rely on floating metrics since this would render the assessments non-comparable. I describe this alternate use of proxies, which I believe is prevalent in society today, as the metrics of criteria. Yet another reason to replace proxies is to gain advantage or control over the thing being assessed; this application of replacement is the focus of this blog pertaining to the weaponization of data. In this case, the objective is to understand aspects of the phenomena that can do it the most harm or help the practitioner gain the greatest benefit.
Rationale for "Warping"
Warping is an art - mostly because I don't have the hardware to fully satisfy the processing requirements. In order for me not to waste time and resources, I am forced to go about my warping thoughtfully and selectively. Warping is the term that I chose to describe a process of continuous proxy replacement to test for opportunities and vulnerabilities; this is for the purpose of gaining material or strategic advantage over something or somebody. It represents a kind of data weaponization by virtue of intent and also by the aggressiveness of the methodology. According to my own observations anyways, some individuals habitually go about this process although perhaps not deliberately and probably not using algorithms. From an algorithmic standpoint, warping simply involves assigning large numbers of events to metrics as I have been doing for some time. However, the choice of metrics is much more varied; and presumably the ontology behind the recognition of events is more focused. For example, I would be biased towards events that I can clearly control, which is generally not the case for me now. It is important to note the following: although it might be possible to warp, this doesn't mean that worthwhile insights are necessarily forthcoming. Indeed, there might be no firm relationship between events and a particular foreign proxy. A highly systematized approach is simply to keep grinding the data continuously - essentially mining - even to gain the slightest advantage.
I have an interesting "real life" example to share with readers. I have been collecting my own health event-data for about 15 months. I normally associate these events with health-related proxies to examine different concerns such as sleep perceptions, eyesight, and weight. Just to emphasize the "structural" nature of warping, I will replace my sleep perceptions with a proxy derived from the Toronto Stock Exchange index. I consider this a bold and perhaps rather nonsensical thing to do at least on the surface. The ability to change the proxy means that I should be able to determine the relevance of event-data to the market. I would not expect to find any relevance; but I should be able to make the determination. Well, this is not the sort of thing that just any software can do. I make use of a research prototype specially designed to assign data-events to a given proxy. While reading about my health data, please consider the wider implications of this methodology perhaps in relation to organizational needs.
I downloaded closing prices for the TSX from an online source: the period is from 2013-06-24 and 2014-09-19, which coincides with entries on my health database. The rather unexciting image of the trading pattern appears below. I decided to use numbers from the stock market mostly because it is so accessible and something a lot of people can understand or relate to. A market index contains a basket of stocks, which makes it less volatile than an individual stock; nonetheless, there are day-to-day fluctuations. Perhaps a more coherent application of warping would be to link a company's business events or aspects of its operating environment to its share price fluctuations.
I chose not to use the closing price itself as a proxy. Clearly the trading pattern above exhibits considerable stability. The stability of the price represents a problem for anyone trying to test cause-and-effect: the "effect" part of the evaluation would be fairly non-repetitive, resulting in fewer opportunities to confirm causality. I chose a measurement of volatility as an alternative to the closing price: C2/C1 - 1, this being the current closing price divided by the previous close less 1. In contrast to closing price, volatility as shown on the next image contains some repetitiveness. I set all of the resulting numbers side-by-side in order of volatility. The pattern below is derived from actual trading data. My objective at this point in the blog is simply to show that a foreign proxy can be transposed over another. In this transposition, the entire proxy-phenomena relationship changes. What events were relevant using a health proxy might be meaningless in relation to a volatility proxy.
The next image represents my end-goal or objective. The foreign proxy has now replaced the health proxy. I created a contextual gradient on the y-axis: in the case of the current example, this "gradient" is made up of volatility levels. This is not to say that the volatility level is its gradient value; but rather, these levels give rise to gradient values through the use of an algorithm. The algorithm determines the relevance of events using the different volatility levels. In light of the above distribution, presumably an event might coincide with volatility purely by chance; however, the likelihood of coincidence declines as more events occur over time. Interestingly, at the end of the process, I still found myself with this curious situation where certain health events still seemed connected to particular market fluctuations. I will reveal additional details at this point: the research prototype had been set with a 1-day lag so I might "predict" next-day fluctuations from present-day health events.
The above illustration shows, most of my personal event-data has no apparent relationship to the market: the distribution is quite close to the zero line. Moreover, in terms of those events not near the zero line, a fair number of them are non-repetitive sporadic events that might logically be associated with market fluctuations purely by chance. This then leaves with me with the oddballs that appear to precede the fluctuations. Since I am using health event-data in relation to a market proxy, there are no firm lines of reasoning explaining how the events might be related to the market. Further, the fact that there might be a relationship doesn't mean it is something exploitable; because as I mentioned earlier, the database contains all sorts of event-data where I might have no clear control. Even if I did have control, this does not mean that controlling the event-data then leads to control over market fluctuations. I immediately recognize the lack of causality in this particular proxy-event combination. But I want to suggest that sometimes, depending on the exact selection, there might in fact be some level of causality.
Double Metering Symmetry
Nothing about my personal health events can be logically identified as antecedent to the market in terms of influencing fluctuations or closing prices. But let's consider a principle related to causality that I call the "double metering effect": both the market and I might behave in a coincident manner to certain deeper but poorly articulated phenomena. I share a relationship with others around me, being exposed to the same world events and having certain predispositions. We buy the same coffee, eat the same donuts, and probably watch the similar television programs. Even if we didn't, there might nonetheless be a stable relationship. I know there are all sorts of valuation schemes such as the present value of expected income streams; relative-market valuation differentials; sentiment and momentum. I'm not dismissing any of these perspectives. I merely suggest that a market is sometimes a market because there is a market - that is to say, a bunch of people with certain desires and expectations obtaining specific things perceived to be important in their lives. Perhaps many of them don't come with financial calculators. There are day-to-day shifts in valuations that simply cannot be explained using conventional methodologies.
This reminds me of a funny story about science fiction movies. When I was in high school, I frequently felt out of place in English class because science fiction wasn't really taken seriously. I don't have any issues with Great Expectations or Macbeth. However, these days it seems like many books, films, and television shows are meant for people like me. When I invest, I assume that others might rush ahead of me or follow behind; indeed, I might be the person lagging. There is a flow. It is a social flow. I find myself captivated seeing dozens, at times hundreds of birds flying together almost like a convective current. I have such respect for these little societies that form among wildlife. I'm saying that there might be coincidences of predisposition rather than pure happenstance. It affects the simplest creatures. Perhaps people aren't much different.
Shades of Darkness
Christopher Columbus over the course of his journeys wasn't just sight-seeing. These expeditions represented significant capital investments. Irrespective of his original destination, at some point he decided to search for gold and slaves. He participated in a rather violent form of exploration. Data-gathering has likewise been associated with discovery, but clearly it can sometimes serve to exploit others. It's no wonder why intelligence must occur clandestinely. I'm not actually questioning why people collect data. I just want to emphasize that there is an "aggressive" form of discovery; this can sometimes conceal important aspects of surrounding phenomena because it was never meant to truly enhance understanding. On the other hand, how can an organization be blamed for collecting the wrong information or for doing it poorly or in a malicious manner when it might not understand its own circumstances? Lack of careful thought can be destructive; it can cause an organization to harm itself and others.
The most aggressive form of discovery involves systematically combing through proxies in search of opportunities and vulnerabilities - both in others and oneself. I believe that generally speaking, an organization isn't faced with the question of whether or not to be aggressive. But rather the organization might be at a total loss - literally confused and disoriented. I don't consider it unusual for an organization to lack knowledge about itself. During periods of instability, perhaps when confronting market barriers, existing intellectual capital might offer little guidance. I consider the use of both indigenous and foreign proxies worthwhile especially for organizations that have little sense of market placement. I'm big on environmental sensitivity. But no organization wants to know about the environment per se; rather, it needs to understand how to strategically place itself. Almost inevitably it is necessary to contextualize at least partly from foreign proxies. The question of how one might go about doing this is addressed by warping.