I sometimes ask myself while musing over the need for a particular service, "I wonder if this is commercially viable?" If the service is routine and the required software is rather inexpensive, perhaps there is little need for a company to outsource. I cannot think of any company that would routinely outsource work normally performed on a spreadsheet. I suspect that decades ago some companies hired specialists to handle spreadsheets; this seems much less likely today in light of the proliferation of basic computer skills. The service that I have in mind and which I will be discussing in this blog relates to my "crosswave differential" algorithm. I have an approach that - although described in detail in past blogs - probably remains quite obscure today. While thinking about providing a bulk service involving the algorithm, it occurred to me that my biggest hurdle might not be technical but rather concerns over privacy. A "bulk service" provider should care about the data (processing it properly) but not necessarily the details (such as who the data involves). In this blog, I will be explaining how I designed a system of symbolic expression in an effort to conceal the details of client data from the service provider - i.e. from myself. However, I will be opening the discussion to also consider the challenges posed by symbolic expression more broadly.

Having read the introduction about a dozen times, I think that it can be interpreted on a few levels. So I just want to clarify that I am interested not so much on "expressing" things using symbols - e.g. let C = 5 where C is an expression of 5. But rather, in asserting that C = 5 and C relates to illness, clearly illness doesn't equal 5. In persisting with the symbolic reduction, it becomes quite difficult to engage the illness at a more complex level. The symbol is primitive while the underlying phenomena is a data object that remains to be fully articulated. The symbol is imposed externally while the object extends from the person. Symbolic reduction therefore has the effect of disengaging the analysis from the complexities inherent in the phenomena. This creates a challenge since one is forced to deal with problems using primitives; this would not normally be a great concern until one deliberately remains with primitives irrespective of need. In order to compensate, since the context cannot be conveyed by the primitive, the user must maintain the context from the outside. So apart from the topic of symbolic expression, there is the the nagging question of compensation brought about by the use of symbols. This blog will flow from the former to the latter.

Submission Design

"Trust Unnecessary" is the term that I have chosen to describe a method of conveying certain structural details of data without including any identifying facts. I designed this approach to encourage potential clients to submit data regardless of the exact nature. I suspect that many clients are interested in determining whether or not service providers have beneficial solutions; however, these same clients do not necessarily want to expose their personal lives to complete strangers or assume the risk of sharing sensitive organizational information. Research indicates that - perhaps due to the stigma associated with having a disability - it is common for people with disabilities to conceal their difficulties in the workplace. I would extend this idea to organizations. Companies attempting to sort out or resolve internal problems seem unlikely to share details unless it is absolutely necessary. I believe that the inability to ensure privacy for clients limits growth in the data services industry. With Trust Unnecessary, clients can maintain privacy. However, the move to increase privacy can sometimes creates difficulties engaging underlying phenomena.

Trust Unnecessary resembles accounting. The resemblance is superficial. In accounting, there are accounts; and there are amounts that are entered onto the accounts. In conventional accounting, the entries are characterized as credit and debits. Presented below are the types of lines characteristic of Trust Unnecessary. Many people can probably relate to symbols like /1829 and /1846 standing for accounts. However, in Trust Unnecessary, these symbols might be better described as themes or thematic constituents. If the themes are focused on organizational departments, /1829 might mean "Shipping Department" while /1846 could stand for "Production." Many accountants likely find such assertions reasonable. How about /1829 for "Paul" and /1846 for "Jean" - still not too bad? Well, maybe /1829 could stand for "Excellent"; /1846 for "Satisfactory"; /1844 for "Unsatisfactory"; and /1838 for "Terrible." An account does not have to be financial since the events associated with them are not necessarily financial events. An accountant stores financial information in accounts, but the devices that they use are not necessarily limited to accounting.

/1829 /1846 /1844 /1838 #these are account symbols
12 15 46 32 48 53 49 25 #these are amount symbols

On the second line, 12, 15, 46, 32, 48, 53, 49, and 25 are amounts albeit not in a traditional accounting sense. For example, 12 might signify "full moon" while 15 stands for "peanut butter sandwich." I suppose I should use a more realistic example. /1829 might mean "homicide case"; /1846 "drive-by shooting"; 12 "female student"; 15 "gunfire"; 46 "student residence"; and so forth. Clearly the accounts and the amounts can mean a great deal although I as the person handling the data can remain in the dark on the exact details. I don't have to know the nature of any data that is submitted. Both the accounts and the amounts are merely symbolic references. The benefit of this approach is how the client can keep the nature of the data private. The client owns the coding reference that explains the numbers. The data scientist need never posses the reference. Even if the data is lost or stolen, it is not particularly useful to anybody else. On the other hand, the person or organization submitting the data has to take responsibility for maintaining the coding reference.

Some readers might interject at this point that having a lot of symbolic references doesn't actually serve a useful purpose since a data scientist wouldn't know what to do with them. This is incorrect, however, as I will demonstrate in a moment. The crosswave differential algorithm can make use of symbolic references in a fairly productive manner. I feel that the algorithm has so much potential, I had to offer a bulk data processing service for those interested. I don't intend to use this blog to cover my service. I ask readers to focus on the methodology and the rationale behind the service. In particular, I am actually writing about the challenge of using Trust Unnecessary without considering how the symbols are constructed. Some of the problems will be reduced when I make Elmira open-source: this application allows users to work in the vernacular while conveying data using TUFF - the Trust Unnecessary file format. More will be mentioned about Elmira shortly.

Steve Jobs and Embodied Expressionism

My use of symbols or numbers in the preceding sample lines of code conceals a great deal about the data. I want to demonstrate how creating and maintaining an "embodied" database depends a great deal on symbolic construction. Consider the idea of embodied expressionism in data. Codified narrative or narrative data is "embodied" in that the person shapes the nature of the data. For example, if I drive to a checkpoint on a secure road, and a guard or officer goes through a checklist to determine if I should pass, who shapes the data? Is it me? Although I am a matter of concern to the data, I am actually subjugated or subjected by its formation. I guess this is a bit ironic to say. I should say instead, I become the object of the data. I am objectified. When Research in Motion was marketing its Playbook, I remember the advertising suggesting (to me) that the product is cool. I reasonably inferred that I could become cool by using the Playbook. Research in Motion didn't have the foggiest idea about me or what I am about. I was objectified by the advertising. Steve Jobs on the other hand was focused on the idea that the product is an extension of the individual: I could finally be myself through Job's products.

Consider some broader social discourse surrounding expression. The LGBT community can probably give insights on the embodied nature of expression. Society sometimes defines a person's identity and practically imposes on the substance of the individual. Similarly, in data, an organization can "project" meaning such that the data becomes what the organization means it to be - not what it necessarily is. On the other hand, a member of the LGBT community, when expressing the nature of the person does so not merely on a political level but embodied. The terms of expression are not set by the broader society but rather the individuals that help shape the conversation in the community. To me during pride parades and other related events, many efforts are made to "articulate" meaning from within the person. The media has an interest in the colourful shows and costumes. I am drawn to the conversation. These are the warriors at the frontline of human expression in a literal sense - in the sense of what it means to be a person in a world that expects compliance. The philosophy of human extension - where things extend from the person rather than being imposed by society - is something that I recognize in Jobs.

The symbols or tokens for accounts and amounts can subjugate or liberate the person. In our society, quantification tends to mean subjugation of the person. I don't mean the physical person per se but the embodied ontological concern. When I use numbers for the submission of code, my underlying intention is liberation. But this is barely apparent due to symbolic reduction. In order to demonstrate what I mean, I will deconstruct the tokenized code so that some of the underlying details can be inspected. Since I don't have to worry about a data scientist peering into my private life, I generally make use of tags that sort of resemble English. Elmira is an application that has the ability to maintain codified narrative; it can also "export" the data into TUFF for the bulk services. (Alternatively, people can create their own lines of code using a text editor or their own application. I only mention Elmira since I plan to make it open source.) Consider some real-life tags from Elmira posted below. The following are "amounts" . . . although I call them "events" on Elmira. The suffixes or declensions _m, _a, _e, and _o stand for morning, afternoon, evening, and overnight respectively.


I hope the tags are fairly understandable even without a coding reference. Clearly I am a person with a great deal to hide from society. I eat things like watermelon and raisin bread, which I suppose some people might regard as subversive. The "accounts" have a different appearance as shown below. By the way, <HeadPainGreat> is more like "I'm feeling great - I have no head pain" rather than "I'm having a great deal of head pain." Apart from demonstrating how the data is focused on the individual, I provide this brief snippet into my life to show how the underlying details of personal data can indeed be concealed from the bulk service provider once it is converted into TUFF. Although these tags are more descriptive than primitive symbols, nonetheless I maintain structural capital outside the data to help me understand the intended meaning: each tag is "anchored" to a description to help ensure consistent application.


These invocations are much more about me than an outside evaluator - i.e. the tags are internally extended rather than externally defined. However, it remains important for me to be sensitive to proximal deficiencies in the ontology. For example, if upon tabulation I discover that [use.margarine_e] seems to be connected to <SleepBroken>, I might conclude that margarine prevents me from sleeping properly. But this is not necessarily the case since quite literally I am using the margarine on something; and it is this other thing that might be more related to sleeping difficulties. Or perhaps the invocation of [use.margarine_e] is in parallel with life circumstances congruent with sleeplessness albeit undefined in its involvement. It is therefore hazardous to handle the data in a purely deductive manner as if the data collected contains everything there is to know about the situation.

Expression of the self as opposed to imposition by a colonizing force seems to involve dichotomous dynamics; it is therefore ironic for the distinction to dissolve during the process of symbolic conversion. Moreover, any kind of narrative cohesion in the tagging disappears as numbers are invoked to express all things. The user might at some point lose all sense not just of identifying details but also the structure of the data. I therefore suggest that the coding reference must not merely provide codes but rather present the structure that the coding is meant to convey. I provide an example of how to configure some coding references below in order to enable the retention of structural meaning in tags that are structurally evasive: for example, "/25 111 134" means that eating a "pie slice" (overnight) and "muffin" (in the afternoon) led to a "terrible" evaluation during data collection (overnight). There are different ways to be creative about structure even while invoking primitive symbols.

Other Aspects of Social Disablement

Just in passing perhaps as a tribute to Jobs, I will mention some other points related to individualism (and corporate sales). Jobs wanted the product to be an extension of the individual. But why might this be important? A business argument against the individual sort of goes like this: given a bell curve of needs and desires, a company should target the majority and ignore the asymptotic minority. This argument could have been used to prevent a company from developing personal computers; because in the early years of computing, hardly anyone used computers. In discussions of social disablement, there is this idea that society should invest its resources for the benefit of the "fit" and "normal" majority, ignoring the portion that is "sick" and "abnormal." The world's experience with Jobs suggests that preconceptions of normalcy are illusions; organizations by ignoring the fringe do not favour the majority or the most lucrative markets per se; but rather, they serve and reinforce and status quo even up to the point of being pushed out of their markets. The best guidance for an organization might be from the asymptotes.

Another idea from social disablement relates to the importance of dealing with people based on their unique needs. If people were all the same, it wouldn't be necessary to have all sorts of apps or products with different shapes, colours, and capabilities. When a researcher approaches a client with a "client satisfaction survey," the response will gain expression only through the survey. The person who designed the survey likely has certain preconceptions of normalcy dictating the parameters of expression. The symbolic tokens or tags might lack adequate structural sophistication to describe anything beyond the most visceral, immediate, and common client sentiments. Apart from being sensitive to how information is distributed over the bell curve, the exact manner in which expression is supported or implemented should be a concern. Because the "bell-ness" of the curve might be an ontological construct.

More on the Algorithm Itself

The blog that I wanted to deliver has finished. The balance is tangential. I now hope to give readers a brief overview of the algorithm. In relation to a particular health concern as expressed by a series of accounts, the processing system that I use interprets each incident involving an amount as either "treated" or "untreated." For example, given the question of whether I slept well after eating celery, eating celery later associated with broken sleep might be classified as "treated" for <SleepBroken>; incidents of not eating celery could count as "untreated" for <SleepBroken>. The following is an examination of a real-life health concern that I finally decided to address using the algorithm. I sorted the resulting crosswave differentials from lowest to highest.

The crosswave "differential" is the result of the "difference" between two crosswaves. There is a crosswave pattern for treated and another for untreated. Just to show the mechanics operating in the background, consider the next image which shows the crosswave scores for untreated. Except for one notable abnormality which I understand but cannot elaborate here (since I have a really detailed explanation), untreated items exhibit readings between 2.5 to 3 in this particular situation. Since the score can be from 0 to 6, it would be fair to say that untreated items "normally" tend to be distributed "around the middle." This distribution around the middle occurs because the algorithmic scores tend to balance out. There is nothing radical about taking the difference between treated and untreated as an indication of effectiveness. However, the crosswave pattern is a bit unusual in that it is based on a gradient: e.g. treated versus untreated for "poor, fair, average, good, and better." The rationale here is that effectiveness is relative. A treatment for one person might result in better from average; but for somebody else, the result might be average from poor. Or this might be the same person having changing life circumstances such as age, occupation, or disability.

On the other hand, treated items tend to occupy the full spectrum as shown below. I personally find this fascinating. I guess the chart would be more interesting if readers were aware of the actual health concern being studied. I haven't changed the sorting of the data. The untreated distribution above is associated with the treated distribution below. These charts are based on real data. Real life sometimes results in choppy charts.

The treated crosswave less the untreated crosswave results in the crosswave differential pattern as shown below. Below-ambient performance is on the left (below 0) while above-ambient is on the right (above 0). How would the results normally be used? A logical response would be to avoid events on the left while prefering events on the right. Sometimes upon examining the distribution of a number of related events, it becomes apparent that a "type of event" seems to be biased towards the left or right; the response might then be to deal with events of a particular type in a particular manner. If some events on the left possess qualities of opposition to events on the right, then the focal-point of opposition might be at issue. For example, if a right-side item represents activity "on" while the left-side item is the same activity "off," the state of that activity affects the outcome in a clear manner.

These charts were generated using 100 days of data - exactly the same data included in the zip file attachment (export.zip), actually. The "concern" is expressed in the file called 4118. I plan to give exact details on how to configure concern on a separate website.

Readers might be wondering, has the "health concern" improved? Well, it's a peculiar story. I had the health concern for about 10 months. For the longest time, I didn't want to use the algorithm to help me reason things out. I should have been born a Mennonite. A part of me is extremely anti-technological and really hates computers. But finally I took a look at the data and came up with a number of measures that have significantly improved the situation. Not only this, but I developed a theory to help explain why the measures might be effective. (The measures seem to be effective. I just didn't have a reason why.) Now, having a plan and actually sticking to it are two completely different things. A person needs will-power and commitment to stick to a plan. I am not aware of the algorithm helping to give me these things; although perhaps I should give it some thought. If I could bottle will-power and commitment, then for sure I would have an interesting blog to write.

Despite my use of a health concern as an example, I will not accept data intended for the treatment for disease since I am not a medical doctor. To the extent that choices can contribute to disease, choices can probably prevent disease, too. But by the time a condition is called a "disease," the conversation might not be about choices but the consequences of bad choices - such as getting cancer from smoking. I have significant doubts about the extent to which an algorithm making use of day-to-day choices can effectively ameliorate "consequences." This leaves the use of the algorithm on non-quotidian choices such as the use of medication; and because I am not a medical doctor, I discourage the use of the algorithm in relation to a medical regiment. In my personal example, I didn't use the algorithm on a medical regiment but common everyday choices pertaining to diet and certain supplements. I already know the likely cause of the health concern. It's related to my occupation. Since it is counter-productive to think of gainful employment as a source of disease, there is really nothing to "cure." I had to find a non-medical solution to help me adapt to "permanent developments," which is my euphemism for old age.

I have been hearing over the years that doctors are beginning to lose the ability to help patients fight certain types of disease due to drug resistance. We add to this the fact that the population is rapidly aging; and there isn't actually a cure for old age. Also, there are declining resources for healthcare in many countries. I will let all of that play out at its own pace. My intention for the algorithm is more for business. It is an experimental approach. I believe there is still room for experimentation among different business interests. My direct involvement will be fairly limited since computers are meant to do most of the work. I design and enhance the code.

Embodied Expression Versus Symbolic Imposition

The fact that there is so much data in the world conceals a dark reality. A fair amount of the data that exists today is "control data." It is data that is meant to control something - processes, people, and capital. For example, in an accounting department, it would be difficult to suggest that much or any of the data is about expression of the person. The symbols that exist on paper are there to impose meaning and to ensure outcomes. When people and organizations submit data, there is a good chance that somebody is trying to or will eventually gain control. These power dynamics on one hand do indeed increase the likelihood of control, but this is at the cost of limiting individual expression. When solutions are not easily forthcoming, it is important to seek a better understanding of underlying phenomena - to make algorithmic approaches sensitive not to the symbols gaining a ubiquitous foothold over reality but rather the quiet victim in the body. In closing, I have some images of aliens, alienation, and alien encounters. The fact that people are sometimes forced inward might indicate that there are fewer places for them outside - that they are an "endangered species" in a manner of speaking. Once symbols become instruments of imposition rather than expression, data loses its meaning and value.

Views: 423

Tags: accounting, alienation, articulation, assignment, client, compensation, deconstruction, disease, efficiency, elmira, More…file, format, healthcare, management, medical, ontology, opensource, operational, optimization, organizational, phenomena, privacy, production, projection, reduction, representation, research, scientific, security, social, symbolic, symbols, trust, tuff, unnecessary


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service