Home » Uncategorized

Following the Odour of Data – Catching Scent

In recent blogs, I wrote about using codified narrative as a form of data. I also discussed using attribution models to systematically evaluate codified narrative for ontological constructs: e.g. “child abuse” “physical confinement” “cannibalism.” I provide a brief overview of these topics a bit later in the blog. The third important piece to make use of narrative data involves “attribution profiling” in a process that I call “catching scent.” Following the odour of data involves establishing a scent and then searching for it. After attribution models create profiles from the codified narrative, I have the search engine hunt for similar profiles. I don’t consider the process particularly original or creative. But on Elmira, the application that I developed to manage codified narrative, catching scent is easy to do. I might create a profile called “Spooky” or “Positive Customer Experience” or “Terrorist Threat.” I then have Elmira grind away in search for similar cases. Below I present a distribution of cases that I compared to a Japanese movie called “Goke, Body Snatcher from Hell” (Shochiku 1968). The movie itself is the last entry to the very right as confirmed by the perfect match.

Following the Odour of Data – Catching Scent

In a nutshell, Goke is about an alien race of blob-like creatures that can inhabit human hosts. These hosts then survive by sucking the blood of their human victims. Goke therefore combines body snatchers, the blob, and vampires to create a fairly threatening extraterrestrial species. What other movies on the narrative database are similar to Goke? The “most” similar is a movie called “300 Rise of an Empire” (Warner Bros. 2014). What does Rise of an Empire have to do with alien invasion? Rise of an Empire has everything to do with alien invasion. The invasion was led by Xerxes against Western civilization as it existed at the time championed by the Spartans and eventually a united Greece. I am a little bit rough on my Herodotus. Off the top of my head, I cannot say if the story is faithful to history. I draw my plot similarities from the movie itself. The second-most similar movie is “Dawn of the Planet of the Apes” (Twentieth Century Fox 2014); this is yet another movie about an alien force entering an occupied habitat – such as humans into lands controlled by the apes. Elmira picks up on the invasion motif.

Among the least similar movies is “Atlas Shrugged Part I” (Twentieth Century Fox 2011) about corporate espionage and the decline of the American Dream. “A Haunted House 2” (IM Global 2014) is comedy about a black man who – after marrying a wealthy white woman – forms a crude sexual relationship with a demonic doll. “SWAT” (Columbia Pictures 2003) is a pretty lame movie about becoming members of SWAT – a sort of buildungsroman. While I haven’t reviewed the attribution breakdowns for these three movies, I suggest that in fact they are nothing like Goke. So Elmira’s catching scent routine – officially called “Sniffing.class” – seems to deliver worthwhile results.

How Elmira Does What It Does

From a movie or any other source of data, the user has to extract “codified narrative.” This process might be compared to retelling the story in symbolic terms – literally using symbols. A “symbol” is a fixed series of characters intended to carry persistent meaning. Symbols don’t have to resemble English terms: for instance, they can be entirely numerical. The question of whose symbolic terms to use and how exactly to do the retelling are topics worthy of in-depth discussion. I will postpone lengthy elaborations for later blogs. I will just mention that on my Universal Data Model – introduced by me in a blog in way back in 2014 – I characterize myself as a “Type IV” storyteller: this a specialist focused on a particular type of narrative often for research purposes. My codifications concentrate on my areas of expertise. This is probably one of the reasons why Elmira is able to distinguish between cases involving foreign incursion: it is because of my research alignment. Although people can always make use of the work that a specialist has done, expanding the database through the addition of code might be challenging for the casual user. Below are some lines of codified narrative from Goke.

[find.of_inv=*massinnocent_fatalities*~_by *person_pilot*_by *person_stewardess*_sur *alien_invasion*]
[return.of_inv=*person_self*~_by *person_stewardess*_by *person_pilot*_von *setting_remotenowhere*_bis *system_cityhighway*]
[share.of_per=*future_plans*~_by *alien_species*_pour *alien_invasion*_pour *deliberate_genocide*]
[speak.of_und=*alien_species*~_proxy *person_wife*]
[blooddrink.of_per=*person_widow*~_by *person_stranger*]
[protect.of_inv=*person_stranger*~_like *person_husband*_like *manner_similarinjuries*_by *person_widow*_about *death_husband*]
[shoot.of_inv=*person_pilot*~_by *person_widow*_pour *protection_security*]
[kill.of_per=*people_passengers*~_by *person_stranger*_mit *successional_demise*_mit *blood_drinking*]
[use.of_per=*person_wife*~_by *person_executive*_pour *method_seduction*_pour *business_funding*_on *person_politician*]
[force.of_per=*person_stranger*~_by *person_alienentity*_pour *human_blood*_pour *serial_killing*_pour *successional_demise*]
[take.of_per=*person_stranger*~_by *person_alienentity*_mit *body_head*_mit *forced_entry*]
[crash.of_und=*movement_airplane*~bis *place_surfaceworld*_bis *setting_remotenowhere*]
[cause.of_it=*technological_failure*~_by *movement_spaceship*_of *movement_airplane*]

Note: Just to add to the confusion, I also use the term “symbol” to refer specifically to the “_item” tags above. Earlier, I was discussing symbols on a more conceptual level: e.g. *person_alienentity* is symbolic (although I don’t call it a symbol). It symbolizes in this case an alien entity embodied as an individual. I call the symbols in ** stars “literals.” There is a historical reason for this use: at one time, I used “” quotes rather than stars.

Codification can be interpreted on different levels. It can be purely behavioural – and on this vein simplistic rather than sophisticated. For example, if a perpetrator “shoots” a victim, the user might say this but nothing beyond: [shoot.of_per=vic]. On the other hand, in the case of a strafe, the shooter is releasing a barrage of gunfire while in motion: on the image below, the red circles indicate the gunshots aimed at the target. Similarly, the target might be attempting to evade gunfire. There is quite a bit of movement that can be lost in a superficial retelling. The codification method that I use – called BERLIN (Behavioural Event Reconstruction Linguistic Interface for Narratives) – can handle strafes. I would even say that its design is premised on strafing action – that is to say, a main action with tangential meanings en route. The user, apart from indicating that a shot occurred, can be really specific. He or she can add details that relate to the action: intent, rationale, mental state, alongside whom, in order to achieve what. By the way, on BERLIN, the strafing can be applied to all sorts of actions rather than just shooting: e.g. gazing, hollering, ordering, and cursing.

Following the Odour of Data – Catching Scent

Once codified narrative is in place, it is necessary for Elmira to systematically evaluate the code. This process is made possible through the use of attribution models: events within the narrative confirm the existence of particular phenomena through these models. A “model” on Elmira is something asserted by the user. An important point – although it might not immediately make sense – is that a model doesn’t have to be entirely correct all the time in order to be useful. Below, I list the events contained in an attribution model called “War.” Not all wars in real life contain these events. (Moreover, some wars in real life likely contain events not listed in this particular attribution model.) A war can be initiated and fought even on the absence of a <doomsday_weapon> sometimes called a weapon of mass destruction. However, the presence of a doomsday weapon can be reasonably associated with certain wars.

<resistance_fighters>
<organized_resistance>
<armedtactical_response>
<armed_suppression>
<organized_execution>
<murderous_orders>
<martial_combat>
<primary_mission>
<doomsday_weapon>
<major_fight>

Below are events for an attribution model called “Invasion.” Notice that <armed_suppression> appears both in War and Invasion. If a narrative contains armed suppression, there might be a war; or there might be an invasion; or there might be both. Does terrorism involve subway stations? Yes, sometimes. It isn’t necessary to take detailed stats to confirm whether or not assertions in the narrative are consistent with past experience. I am certain that before 911 occurred, somebody was working purely on narrative. So we should not dismiss the narrative – for it is the foundation of later action. Although I consider the idea of using statistics to deduce terrorist attacks foolish, it certainly isn’t my place to delegitimize learning experiences. I just hope people keep an open mind on the role of numbers. I don’t think that terrorists have teams of mathematicians or statisticians. Perhaps it depends on the particular organization. There could be a geek fascist regime – foretelling the future like those advanced humanoids on Fringe.

<disruptive_colinisation>
<superior_enemy>
<technological_mismatch>
<deliberate_genocide>
<alien_invasion>
<scarce_resources>
<overpopulation>
<armed_suppression>

Once there is codified narrative in place along with attribution models, profiles need to be constructed. The computer-generated “attribution profile” for Goke is shown below. Sorry I didn’t bother to make this look nicer. Like I said, it is computer-generated. Apart from these profiles, the user is free to manually set the parameters. Perhaps a heated debate that people will eventually have – once they gain the ability to catch scent – involves the extent to which a profile should be defined by the user rather than extracted directly from the storyline. It is a given for example that regardless of my expertise on any particular subject, I cannot possibly know everything about all subjects. I might not know exactly what attributes to take into consideration. For example, if I hope to determine whether or not fraud is taking place, the act of defining it might render invisible the realities surrounding the problem; then my process of detection becomes an instrument of confirmation – entrenching my preconceptions of reality.

Abduct.txt at tests,6.666666666666664 to 100.0
Bones.txt at tests,1.1111111111111107 to 100.0
Buffy.txt at tests,17.27272727272727 to 100.0
Cannibalism.txt at tests,40.0 to 100.0
Confine.txt at tests,90.0 to 100.0
Death.txt at tests,4.285714285714285 to 100.0
Disability.txt at tests,18.57142857142857 to 100.0
Disaster.txt at tests,65.0 to 100.0
Dishonour.txt at tests,23.33333333333333 to 100.0
Genocide.txt at tests,90.0 to 100.0
Gore.txt at tests,0.0 to 100.0
Invasion.txt at tests,50.0 to 100.0
King.txt at tests,10.0 to 100.0
Ladyluck.txt at tests,6.666666666666664 to 100.0
Missing.txt at tests,56.66666666666666 to 100.0
Politics.txt at tests,6.666666666666664 to 100.0
Serialkiller.txt at tests,30.0 to 100.0
Snatcher.txt at tests,56.66666666666666 to 100.0
Splinter.txt at tests,40.0 to 100.0
Trapped.txt at tests,40.0 to 100.0
Uprising.txt at tests,6.666666666666664 to 100.0
Usurp.txt at tests,40.0 to 100.0
Vampirism.txt at tests,90.0 to 100.0
War.txt at tests,20.0 to 100.0
XFiles.txt at tests,12.222222222222221 to 100.0

The above profile indicates the bias that I programmed into the system. In Goke, there is an alien entity that sucks human blood. Consequently, the attribution model called Vampirism within the profile requires a match of 90 to 100 percent: it is due to the high level of certainty between Goke and Vampirism. On the other hand, the attribution model King (the rise of a leader) allows for a match from 10 to 100 percent. This means that King might be indicated even if cases barely have the required events. A high level of uncertainty leads to the greatest range for matching purposes. Clearly there are many different ways to structuralize the distribution: another approach is to simply select cases that “closely” resemble the attribution profile +/- 10 percent. By the way, the matching described here relates only to case “features.” Matching can also take place on case “behaviours” and “settings.” The construction of attribution profiles can therefore be quite an in-depth topic.

Using Quantitative Hybrids to Catch Scent

By introducing the idea of catching scent, I hope that I have demonstrated how the use of codified narrative can lead to fairly interesting applications. Catching scent can take place if codified narrative is collected. Where a more quantitative approach appears to be failing, where the absence of context seems to impede progress, I hope users consider my method described here, which is almost entirely driven by context. Although I have discussed movies in relation to codified narrative, it is possible to extract code from many different sources. I also want to point out that normal quantitative data can be embedded within the code. In order to embed quantities, consider the example below. In a moment, I will explain how catching scent is possible even if quantitative data is embedded in the narrative.

Regular codification: [shoot.of_per=vic]
Use of quantity: [shots_fired=15]

The first line is narrative. I call this a “proxy.” The second line assigns the value 15 to the variable “shots_fired.” It is quite easy to see how the quantitative expression morphed into the codified narrative, right? In fact, were it not for the victim (vic) at the end of the codified expression, the event [shoot.of_per] would receive a default value of 1 for compile purposes. When used as line objects, it becomes possible to add quantities for example as transaction amounts for account balances. This means that codified narrative can form the basis for a generic non-relational data collection system. Quantities do not impede catching scent operations that make use of “attributes”: this is because attributes are separate from behavioural events in the codified narrative. In plain English, this means that a person can maintain all sorts of quantitative data without necessarily interfering with catching scent – but only if attributional data is also present. In fact, all of the events can be quantitative if so desired by the user. Since the code would lose its ability to express narrative if it is fully quantitative, consider instead the idea of hybrid code containing both narrative and quantitative data.

I consider hybridization most useful when lived experiences intersect with instrumental demands. Take for example “stress” as a lived experience, which I feel can be easily expressed as codified narrative. It is possible to interpret stress almost entirely in relation to “performance metrics,” which are not easily transcribed into narrative. Stress is not about performance per se. But there are people interested in performance also concerned about the impacts of stress. If only performance metrics were collected, one’s understanding of stress would be shaped solely by the lens of performance. If only narrative is collected as it pertains to the lived experience, performance would fail to be considered. Similar dynamics can be found between “demand” and “sales”: it is possible to collect endless amounts of sales data without understanding anything about the underlying forces triggering purchasing decisions. Catching scent remains an option even in this hybridized data environment.

It is possible to incorporate narrative and quantitative data in the same system because Elmira uses a method of association that I call “mass data assignment.” The exact nature of the data being assigned to the attributes isn’t too important in a bulk assignment scenario. As useful as assignments might seem, it is often desirable to have some means to support the systematic computer interpretation of events. That’s a separate concern however. If there are attributes, it is possible to catch scent. The configuration of the attributes can be simplistic for example almost like demographics for screening purposes. Or the attributes can make use of formal models. In my blog specifically about attribution modelling, I mentioned a model called the General Disablement Model (GDM v 2015), which I consider fairly elaborate. It is therefore possible to catch scent making use of formal constructs for ontological recognition.

Following the Odour of Data – Catching Scent

Catching scent can take place in any setting. But in relation to hybridized data, it might be applied to situations such as the following: incidents of workplace accidents; employee performance evaluations; incidents of bullying; domestic violence and abuse; stories from refugee camps. Hybridization helps fill the cracks in data where lived experiences might be diminished or alienated by their contextual expression. The tug-of-war between the projected context and the articulated experience is unlikely to be balanced: analytical methodologies will tend to favour quantitative methods and therefore the projected context. Catching scent is a way of dealing with events in a “mostly” non-quantitative manner. This is not to stay that it is a substitute for quantitative methods. I believe that quantitative dependence is overly decontextualized to the point of being alienating. Quantitative alienation has the effect of stripping people of their rights by quelling expression. It can cause interpretive distortions that misdirect resources. It can lead to poor customer service, reduced quality, and market disenfranchisement for companies.  I therefore offer this alternative regime to place more emphasis on embodied experiences and consequences.