"Look, they are one people, and they have all one language; and this is only the beginning of what they will do; nothing that they propose to do will now be impossible for them. Come, let us go down, and confuse their language there, so that they will not understand one another's speech." (Genesis 11-6,7) On the distance between expression, meaning, and action resulting from growth of populations and diverse competing interests - or angelic intervention - one of these. Also as explanation in advance if this blog is difficult to follow.
Those that design questionnaires should be familiar with the two distinct parts: 1) demographics; and 2) responses to the questions. Almost every questionnaire for me has asked for my age and income. In this blog, I will be taking the idea of demographics to a more abstract level. Demographics facilitate attribution for the rest of the data collected. Attribution need not involve anything like age, education, or income. From a survey that gathers information about employment, it might be possible to determine the employment rate. Attribution in a data system enables storytelling. The story here is that the employment rate is, well, the employment rate. The narrative would be enriched by the ability to associate the data with different segments of the population: e.g. which segments of the population are employed, to what extent, and doing what types of work? However, if a survey of operations is conducted on a production facility, the attribution would likely relate to particular products and processes rather than people. The attribution-versus-data divide is an important conceptual separation: there is 1) the data such as the metrics collected from the facility; then there is 2) the attribution scheme intended to help managers make decisions. The former is "attached" to the latter. If data were to persist in a disembodied state, it would be difficult to make meaningful use of it. Attribution can at times be a form of contextual assertion or ascription rather than the recognition of inherent character. A story containing only numbers has a paralyzed plot. "Analysis paralysis: There has been an increase in overtime since spending declined on office supplies." Yes, so? "Attributive narrative: Well, the ink ribbons are being re-inked to save money. These re-inked ribbons are causing the printers to freeze up. Jobs have to be repeated three or four times. Deadlines are being missed. We have to pay people to do the work longer." Numbers just don't necessarily give the structural information to guide constructive change.
When researchers ask for age, implicit in the design of the survey is that age matters. This does not mean that age necessarily "matters" in a true sense but rather in relation to those asking the questions. It remains conceivable that some researchers might try to collect demographics such as age from people working in production. I would consider it unfortunate if researchers attempted to do so; this almost suggests that at some point a manager might attempt to influence production through discriminatory practices. (There might be an employment test that young men specifically are more likely to pass, which again tells us more about the researchers than the subjects of the testing.) Consider calls going to the complaints department of a retailer. Given the opportunity, some researchers might try to extract demographics from callers. But the callers won't stand for it. They have complaints, after all. They are not calling for laughs. Unless the calls will be further distributed to service agents based on the age of the caller, the age serves no useful business purpose. The main purpose of attribution in the complaints department is to determine what went wrong. Constructing attribution in such a situation requires some thought from a design standpoint. If data is collected without effective attribution, the problem giving rise to the complaint might persist; this is due to our inability to locate and therefore target the problem. Intelligence in the fight against terrorism should be about locating sources of terror; so one would expect a sophisticated attribution regime operating in the background rather than endless floods of disconnected data.
Apart from storytelling and the targeting of operational problems, attribution also represents an important way to deal with unstructured data. In a moment, I will be discussing the ACTOR Attribution Model, which I developed myself. I used it to extract data from legal documents. As shown in a sample below, although a case document might be 15 to 150 pages in length, I was able to complete the same standard survey in each case. Unstructured data on the surface is unstructured due to lack of fixed structure. However, I would say that it is also unstructured because of its tendency to evade a particular fixed context: e.g. this cell is for sales; sales; sales; more sales; more sales. The unstructured data will not fit this predefinition not just of structure but meaning. It doesn't provide a fixed commodified packet of thoughts, ideas, or details in a specifically prescribed manner. I realize that people tend to focus on the presentation of the data: it is this presentation that usually determines whether the data is described as structured or unstructured. Suffice it to say that I regard attribution as the missing link that is nonetheless completely necessary to give unstructured data meaning. Sometimes, the attribution or structural details inside data is the meaning that has to be conveyed. The cases that I reviewed didn't contain a lot of demographics, which in any event are rather superfluous. However, judges have a cumulative understanding of where facts fit in their decisions. Although perhaps not normally invoked as a kind of abstract model, conceptual attribution likely helps to guide the assessment of facts and attachment of relevance.
Attribution and Assignment to Mass Data Object
Rather than associate data with demographics, data can be connected to "concepts." In relation to these connections, I use the term "assignment." Production metrics, stats, and scores can be "assigned" to production personnel in order to monitor their performance. However, I put aside the issue of what is being assigned in order to consider the question of how to characterize the data object receiving the assignment. Rather than assign to demographics, how might a person conceptualize or construct a framework for the "assignment object" such that data being assigned can be relate to anything? "I have a complaint. The person who sold me this dishwasher gave me faulty information about water and electricity consumption. I want my money back." Again setting aside the question of metrics to assign, I suggest that this comment is mostly about attribution. This and similar problems are not going to be addressed by changing prices, having a sale, offering credit promotions, or increasing advertising. I will now share more details of my own research. I want to emphasize before proceeding, although many people talk about data in a loose fashion, I am rather concrete about the separation between the following: 1) the metrics and qualitative events; and 2) indexes in this case the assignment object for the attribution. It is important to be specific and explicit; because there is no place to hide when one attempts to create coded solutions. Below I offer ACTOR and its facets of attribution. The items in blue are aspects of the survey sample that I introduced earlier. The items that are uncoloured are additions made in recent years.
Although I started working on ACTOR in 2009, the attribution model as it exists today is primarily the result of research that I began during my graduate studies in 2010. In a past blog - perhaps in several past blogs - I mentioned using ACTOR to study human rights tribunal cases in Canada. ACTOR has a pedigree. After I noticed how several existing models relating to organizational behaviour (for example) seem to contain common elements, I decided to pose these commonalities in more abstract terms. To the best of my knowledge, I believe that I am responsible for the hierarchical nature of ACTOR. I found the hierarchy missing from other models, leading to noticeable disconnection among their elements. The way researchers might attribute to demographics the data pertaining to complaints, it is possible to "assign" data to a conceptual scheme such as ACTOR. My research had to do with customer service complaints involving people with disabilities. Although this research probably seems highly specialized, I consider my approach and findings fairly portable. People with disabilities represent a vulnerable group. The cases are special considering the elevated impacts to the customers involved. However, I consider the narrative of human rights complaints mostly about neglect, indifference, and apathy. These are concerns that affect everybody rather than only people with disabilities. I would portray my research as a study of customer service problems albeit on a deeper social level rather than just administrative or business. In relation to this blog, ACTOR can help in the assignment of customer service complaints data.
In customer service complaints brought before human rights tribunals, I found that cases in favour of the plaintiff tended to contain comments involving at least one of the following: A) attitude; C) conduct; T) tenacity; O) organization; and R) role recognition. I cannot provide a detailed explanation of these categories here; my cursory paper on the subject was about 50 pages long. However, returning to the complaint implicating the dishwasher salesperson, I would say that the comment made is mostly about "organization" (e.g. systems in place to provide accurate product information); it might also involve "role recognition" (accepting the need to confirm the accuracy of facts). The illustration indicates that ACTOR provides discourse attribution. Almost inseparable from discourse is the question of polarity: N) negation; C) confirmation; and A) affirmation. The comment pertaining to the dishwasher salesperson involves "negation." The client has something negative to say about the information was provided. The client also identifies a specific individual - the salesperson. The pointing of blame involves "attribution": M) mind; B) body; E) environment; S) system. In this case, I would say that the comment likely involves both "body" and "system." It really depends on the client whether blame is being attached to the salesperson, the system that the salesperson was using, or possibly both system and person.
The data object is only half completed. The question of "reference" involves identifying the source of the attribution. For example, if a body is involved, where is the body located? The elements are as follows: M) manufacturer; W) wholesaler; R) retailer of the manufacturer pre- and post-sales; B) bulk or mass-market retailer pre- and post-sales; D) dealer or specialty retailer pre- and post-sales; and C) customer service pre- and post-sales. I think I've covered most of the bases. The dishwasher complaint involves "bulk or mass-market retailer pre-sales." All right, the contact pertains to the level of contact: I) interactive; A) administrative; D) design; and S) strategic. "Interactive" means the person directly interacting with the client in this case the salesperson. Finally, there is the question of level of implication in the blame: I) incidental; P) procedural; S) structural; and C) constructive. Based on the comment from the client, one would hope that the dishwasher case is "incidental." If providing bad information were part of a procedure, this would certainly be more serious than a particular incident. Yet it may very well be that for some organizations with certain repetitive processes, the same errors might reoccur irrespective of agent, leading one to conclude that the problem goes beyond mere incident; and the client might say something to this effect.
I quickly covered a lot of letters in the preceding. I will elaborate a bit further on "implication" just to demonstrate some of the shades of meaning that affect attribution. By implication, I mean that blame is imputed. In my customer service paper, I identified two main types of implications made by the adjudicators reviewing cases: 1) event level based (ELB); and 2) character level based (CLB). ELB is equivalent to an "incidental" level of implication. CLB is more like "structural" level. Essentially, I found that an adjudicator sometimes commented on the specific incident of the case; corrective action for these incident-oriented cases is normally a simple matter. In other times, the adjudicator commented about the character of the service provider. A customer service complaint can therefore be regarded as the "tip of the iceberg" indicating a need for major change. Similarly when a client makes a complaint (as opposed to an adjudicator imputing blame), he or she might be focused on the particulars of a specific incident (incidental implication). Periodically the grievance might involve the processes created for clients (procedural implication); the way things are done (structural implication); and how the company has chosen to do business (constructive implication). The choice of implication is important in order to characterize the nature of a customer service concern and more importantly who has been implicated in the service failure and who might be expected to change or be responsible for change.
As with ACTOR, I regret not being able to cover all of the attribution elements here. Consider taking my word for it: the elements of the ACTOR Attribution Model can capture the gist of many types of customer service complaints. Having a system of attribution is not the same thing as having measurements or qualitative events. I realize that the focus of analysis has tended to be on the data stream: e.g. the stock price rather than the structural details responsible for the prices. I believe this is so because attribution is rather complicated. In any event, a person such as myself will usually try to piece together the structural determinants; in doing so, I certainly do not wish to diminish the more traditional approach of simply examining the data stream. I know that some readers might be wondering, is this really data science? If it isn't, then the people doing data science might lack the means to connect their findings to the production setting. (It wouldn't be clear what in production or operations should be changed in order to alter future outcomes.) True enough, it might be possible to "predict" future outcomes. However, an organization really wants "control" future outcomes. To this end, I consider it important to be able to make sense of organizational complexity enough to influence the metrics. I have written about hunting for and tracking "ghosts." This isn't for the sake of paranormal adventure. Attribution is about making use of structural information to bring about positive change.
Returning back to the dishwasher complaint, all of the blame could have been attributed to the specific sales agent right at the outset (a non-conceptual attribution). If I am in Toronto studying data from different sources, I might become aware of the dishwasher complaint regarding "Associate J. Smith" in Hull. However, this doesn't tell me much on a systemic or structural level. If J. Smith leaves the company and yet similar types of complaints emerge regarding other associates, it seems that I lack the means to make firm connections. I wouldn't be able to make sense of the complaints at least from a management standpoint. I consider it unlikely that an agent would want to deliberately give inaccurate information about a product. Arguably, it takes less effort to give accurate product specifications to clients than to mull over and precariously offer fake numbers. The complaint is really about the lack of an easily accessible system to support agents with accurate product data. The structural nature of the problem would be invisible if complaints were merely attributed to particular agents. Attributing to an agent is a tiresome "gut instinct." The usual remedial response might be as follows: "Are the agents providing bad data to clients? This probably means that the agents are bad. Hire better agents." This assertion is hardly scientific. Any child can give it since little thought is required. It is also messes up the allocation of resources and causes people to do pointless things, which I suppose would be humorous if the outcomes weren't so serious.
At this point, I have covered enough to assign pertinent events to the data object. Do I want to know how the level of sales are affected by complaints? If so, I could assign sales to the object. How about the specific department, time of day of the sales, the music playing in the background, the weather outside? It is a mass data assignment. It doesn't matter how much data gets assigned. This then is the main difference between the attribution and ordinary metrics: the attribution by providing context helps me to make decisions based on the metrics; the metrics on the other hand are frequently lack contextual guidance. If an organization has a lot of metrics but minimal attribution, it doesn't have much at all. Some of its data is probably just taking up server space. Due to the lack of attribution, such organizations might not even be aware that its data is impaired or how it came to be this way. If sales are in the gutter, it's difficult to blame the problem on anything specific since the organization has minimal attribution capacity. The company could just as easily blame declining sales on bad lighting or lack of hand soap in the washrooms. Sure, that might seem ludicrous. But I have a case study where, in order to deal with rising health care costs, managers thought about "flattening the organization." Health care costs rise when people become sick. People do not get better by flattening things or shuffling assets - unless of course the assets increase fresh air ventilation, improve water quality, and promote physical activity. Don't confuse neoliberal quasi-intellectualism with actual intelligence.
I consider conceptual attribution ideal for mass data assignments in a non-relational processing environment. However, I couldn't think of a way of presenting such an application on this blog. I therefore decided to offer the relational depiction below. Pretend this is a spreadsheet. The idea of "assignment" is difficult to convey on a spreadsheet. Essentially when sifting through discourse, all sorts of qualitative details can be recognized. The multifarious nature of these details prevents meaningful tabulation: specificity for example can render an event relatively non-repetitive. There might also be hundreds of thousands of qualitative details. The idea of sorting the spreadsheet based on "freeform qualitative" events is impractical. On the other hand, if the events are assigned or conceptually attributed, the attribution model determines our ability to access the qualitative details. The attributes in the ACTOR Attribution Model introduced in the preceding can be stored in a number containing 26 bits, which is smaller than the integer for most computers. However, if a spreadsheet is to be used, I could simply set aside 26 columns to accept user input. (It remains easier to deal with 26 columns of conceptual attribution than thousands of columns of freeform qualitative events.) Some attributions are non-conceptual. Common non-conceptual attributions include account and reference numbers. I would separate the non-conceptuals from the conceptuals. Finally for selection purpose, I personally would never sort the data. I would instead create "attribution filters"; there can be many of these, so I would give them their own prominent real-estate usually somewhere to the left of the spreadsheet. Don't put them to the right of the freeform qualitative events; after all, these might continue for thousands of columns.
Qualifications of Those Making Attributions
Although the general public can be loose with freeform details, it takes a fine knife and steady hand to deal with attributions. Perhaps at some point a machine will be able to do such delineations. Is a data scientist similar to a statistician necessarily better qualified than anybody else? I don't believe so. Attribution makes use of one's ontological capabilities. In order to recognize something as something, it is already necessary to know what it is. For example, if I want to identify "poverty" in a situation, I must already know what poverty is. Is going to vocational college rather than a university a form of poverty? Is mopping floors indicative of poverty - due to lack of job security, the wages, servitude to a demanding employer, imposition of physically challenging deadlines? If poverty is simply an aspect of income, then I shouldn't be able to recognize it without income data. Yet the people doing the marketing for charities use perceptions of poverty to encourage donations. What is "Attraction" (the "A" in ACTOR)? This is something that can be entrenched and socially constructed. It reflects business motivations. Attraction can relevant at the earliest stages of business development. A statistician doesn't necessarily have insights by virtue of profession or academic background. I'm unsure if everyone can recognize Attraction in a manner reliable and consistent enough to support the retrieval of important facts. ACTOR is hierarchical. "A" precedes and brings about "C." It is necessary to understand the conceptual shades in the transformation from Attraction to Conduct.
Future of Change
Conceptual attribution is probably worthwhile in relation to unstructured data. I would tend to use it to deal with open-ended input from interviews and questionnaires. However, as I mentioned earlier, I used it on human rights tribunal cases. In terms of tribunal cases, I found that a significant amount of the analyzable data was in fact in the form of attribution. By now when I use the term "attribution," it should mean a bit more than its literal dictionary definition. I mean things like the nature of the discourse, the people implicated, the nature of the people, and nature of the implication. I found that cases tended to be about the justification and attachment of blame. I infrequently encountered quantitative data except in relation to awards for damages. If a data scientist were purely focused on the numbers - and I certainly don't deter such an individual from this focus - this would suggest that his or her services are limited in situations lacking numbers. A data scientist might choose to steer clear of conceptual attribution, which I admit is quite different compared to statistics or physics. Yet I remain under the impression that all sorts of customer service problems are related to complexity of attribution; the field of customer service represents an important market full of unstructured data. The absence of structure in customer service data does not negate the involvement of data scientists - just those that do not wish to deal with unstructured data.
The inability to detect important things is a problem more related to recognition than detection per se. A company is exposed to facts all the time; but it might not be aware of when these are important. Moreover, the facts that receive the most attention might be the least important. I find it easy to appreciate how the attachment of importance or significance can represent a major business concern. Decontextualized data is particularly problematic for an organization because it doesn't convey much about the contributing factors or resulting outcomes: it might be unclear how the data is important, in what way for which organizational functions. I want to point out how attribution over data probably has to be sophisticated to manage a complex organization if management capacity is limited. I say probably because much also depends on the nature of the business and other technologies already present. However, in general, lack of clarity in attribution likely requires more adventure among interventionists. There would be more guessing, risk-taking, and all sorts of gut-instincts. For effective change, it is necessary to know what to change. On the other hand, as opposed to the gamblers, there are foragers. Data scientists should not get into the habit of collecting and analyzing extremely large amounts of data as if prospecting for gold or fishing. Why? Well, although I don't dispute the possibility of "getting lucky" through mining activities, it is difficult to build a business model on happenstance. I just don't believe that people would finance expeditions where success might occur by accident.
In academia, it seems perfectly fine to examine situations without necessarily bringing about beneficial action. I am not saying that beneficial action doesn't sometimes occur. But an intellectual pursuit does not have to bring about change. Quite the contrary, I think that activists in academia are sometimes penalized as if their activism might somehow taint their intellectualism. On the other hand, in business, an intellectual pursuit that doesn't lead to beneficial change will likely be terminated for being superfluous. I have noticed certain levels of anti-intellectualism among some in the business community. Consequently, I don't really write about "social disablement" per se when I expect a fair number of readers to be business-oriented. I might pose the situation as systemic "data attribution failure." Yet data is a key source of enablement and disablement in society. Where there are alienated people, I expect to uncover alienated data. It is almost as if the data doesn't take real people into account. In my previous blog, I wrote about companies becoming disassociated from the market. It is rather esoteric to suggest that an organization might somehow become gradually distanced from the people wandering about in its buildings. I really mean that the data is becoming alienated: this can occur due to increasing complexity but also because of huge amounts of disconnected data. To arrive at solutions and prevent fanciful wayward adventures, the attribution models for the data should be firm. I would go so far as to suggest the following: it is not the quantity of data but rather the quality of attribution that will determine the effectiveness of strategic initiatives.