BERLIN stands for Behavioural Event Reconstruction Linguistic Interface for Narratives. I introduced BERLIN a few blogs ago - in my "final blog." Theoretically after one's final blog, no further blogs are forthcoming. However, I am now posting bonus blogs reflecting aspects of the same closing subject. Today, I will be elaborating on BERLIN's syntax and how its searches are facilitated. As a general rule, the objective of BERLIN is to convert human-friendly narrative into computer-friendly code. It isn't unusual for computer code to be expressed in a human-like language for the purpose of executing a program. A computer program has rigid parameters. BERLIN on the other hand is designed to support "expression." Rather than the code adapting to the needs of a computer program, it is adapted to the requirements of expression. BERLIN remains shaped by a type of program of sorts. The program is external to any computer operating environment. When we do things in our day-to-day lives, there are often programs running: going to work, shopping, stepping out for lunch, finding a parking spot - routines that we follow as part of our involvement in the "program." Codification of our involvement persists in our literature, films, poetry, music, and art - albeit not necessarily in a form that accommodates computer analysis. People are "programmed" in a process sometimes described as "social construction."
Before moving on to the issue of BERLIN's syntax, I want to briefly share how I believe narrative develops. To date, I have been able to extract narrative from all of the following: movies, television shows, news articles, the evening news, court cases, fairytales, and paintings. I provide an example narrative data-extraction a bit later under "Narrative Theory." For now, I will discuss just the general idea of a narrative. When people use the term "storytelling," I believe they actually mean "explaining the data." The ability to explain things is not quite the same as delivering a story. It is important to be aware of the distinction or difference. There is also research on how people go about the process of telling a story - and how stories gain shape among those in an audience. Consequently, a conversation about narratives can probably get quite esoteric. I'm certainly not the sort to turn away from such discourse. However, in the move from things fleeting and erudite to matters focused on data structure and syntax, it becomes necessary to nail down all sorts of points that would otherwise be free-floating. Moreover, to move from syntax to an actual software application, I'm afraid there is little room for flotation.
Therefore, I apologize in advance for rigidly imposing what for me a narrative is. A narrative contains three main parts: A) settings and situations; B) behaviours and events; and C) changed settings and situations. Off the top of my head, I plug a story into the three parts: A) armed man, his knife, crowd of people; B) multiple stabbings and fleeing; C) several fatalities, injuries, physical arrest of perpetrator by police. How exactly the story is told would probably be addressed by those focused on the "telling." On the other hand, I concentrate more on the issue of inclusion. Inclusion is actually a data issue - i.e. what gets included as data - while the telling of a story is more stylistic and perhaps the product of literary convention. Inclusion relates to the question of "how" and "why" or "to what end?" The narrative from a data-centric standpoint is an ontological conversation of what counts and why. People can provide a variety of narratives if asked to do so for example about a car collision at an intersection. I believe that there is a tendency to conflate language skills and telling skills with differences in how people perceive the relative importance of things, how these items are arranged, and how the pieces fit together.
When a narrative is harnessed, it is the differences in inclusion that provide the interesting "data." Consider as a counterpoint the idea of counting something - the same thing - over and over again. For example, one might count the number of coffee canisters coming off a conveyor belt. Inclusion in this case is based not on differences but on conformity. The whole objective is to nullify the narrative - or to ensure it never changes. When the Apostles in the New Testament give an account of Jesus' final days, I think that many would agree that their stories provide a much more worthwhile account than any single perspective. At question isn't what happened on a purely factual basis but how the people close to Jesus made sense of the events - and hoped to help others make sense. The narrative is more than a string of events. How does one maintain and make use of the data in a narrative? This brings about my own efforts and of course this blog, which provides a bit of preview of software that at some point might become open-source.
Inherent Benefits of Narrative
Narrative is useful because it can be applied to all sorts of situations that would otherwise be data poor (from a quantitative standpoint). For example, songs often contain extractable narrative data; but they lack quantitative data. I put together a few major points to promote the use of narrative. Point #1: Human interaction is rich in narrative. (This means there is a lot of data.) Different narratives are possible from the same resources. Point #2: Narrative can often be collected without invading privacy. Point #3: Even fictional narrative is useful. Point #4: Narrative is powered by human imagination. Point #5: Narrative captures aspects of socialization and social construction. Thanks to BERLIN, narrative can also retained as code for computer analysis; this gives an analyst the ability to systematically examine the individual components. In the Delphi method, experts within a particular field might be asked to assess a situation or condition. For example, renowned economists might be asked for their position on growth. Consider a challenge such as the following: "You are all great writers. We are asking you, how might the people in this city deal with a particular natural disaster?" This is more detailed than the Delphi method. The problem is how to make use of narrative in order to render a cohesive analysis.
Narrative Theory isn't part of BERLIN per se. I simply found that BERLIN doesn't work without some basis in theory. If a person looks at a painting, without narrative theory, it might not be obvious how to extract data. As an example, consider the oil painting below from my own collection: this piece is called "The Rock" by K. McLeary in 1967. This painting hangs in the hallway outside my bedroom.
I know the painting isn't clear from the photograph. The child on the left seems to be trying to reach a seagull further up the rock. This painting has many potential stories. Even before data is collected, if any data can be collected, I think that many people would agree that this painting contains details pertaining to setting and scenery. Now, if we all became theoretical ontologists - as I periodically identify myself to be - beyond the basic scenery, I believe that some people would zoom into aspects of the painting that they themselves consider important. A child climbing a large slippery rock might be injured, leading me to suggest "physical hazard" as important. How about the contrast between sky, earth, and water indicating an intersection of freedoms for birds, humans, and boats? Yep, I'm afraid that's where my head goes when I'm staring at things. It seems like a nice family outing, right? What if the children don't actually belong to the lady? Child abduction is possible. Ontological significance represents a different type of data than setting and scenery. I get to "attach" my impressions to the setting. Worded different, the setting allows me to attach certain impressions.
BERLIN is all about filling the space between setting and significance. Significance is an ontological assertion. Believe it or not although I won't emphasize the point here, even scenery might be regarded as an assertion. For the sake of argument I would agree that scenery tends to be more objectively confirmable than asserted. (Then of course we discover that the rock is part of a troll's forehead buried in the sand - that the kid is actually a wizard about to raise the troll to attack the lady and her child.) The question is how one might go from the setting to the assertion. The development of narrative is one aspect of Narrative Theory (according to me - offering both theory and application as a package deal). I think some readers would be interested in how social construction might lead to the pathology of a criminal narrative. It is within the context of social construction where there can be a convergence between story and factual reality. Social construction brings about predictability. The telling of a story doesn't necessarily lead to the prediction of an outcome. I feel that Narrative Theory more generally speaking examines how personal circumstances, ideals, and social settings influence the construction of stories. However, I believe that with computers narratives can be used to help researchers study data and different phenomena of social construciton.
Implementation of BERLIN
BERLIN is currently implemented using a program called "Elmira," which I likewise introduced a few blogs ago. A peculiarity of Elmira that might matter to those interested in developing their own system is that it is non-relational. I never have to worry about inadequate table space since no tables are used (except for presentation purposes). Elmira is currently written in Java. In the past I have included a screen shot. The problem these days is that screen is so busy that it is difficult to make sense of it from a picture. I will give it a try mostly to give readers a general sense of appearance - see below. Rather detailed, right? I am so happy to use two screens these days: I might have a specimen or data resource on one screen and Elmira on the other. I understand that these days some systems permit the use of three screens simultaneously. I might give the 3-screen approach more thought in the future.
On the left is something like a data-entry screen. Inputs are computer assisted. Because Elmira is non-relational, fields can be added on-the-fly without much difficulty at all; however, this means that large numbers of fields quickly build up, and I often need the computer to "guess" the field I am trying to invoke. It is my experience that for any given narrative, there are usually just a limited number of key characters; and these people are used repeatedly throughout the story. To speed up the process of finding the correct fields, a quick-select field table appears on the right screen. Behind this right screen is Disney's "Black Hole," which happens to be my source data. Ideally the source data would occupy a separate monitor. Here then is a situation where three monitors would be best.
Structure of BERLIN
Using BERLIN, behaviours are expressed with lines of structured code - each line for a single action. Each line of code has two major parts: 1) a "proxy" - describing the general nature of the action; and 2) "symbols" - conveying the meaning of the action in specific details. Proxies and symbols make use of "literals" or tokens. A process of conversion into tokens might therefore reasonably extract keywords from a sample of text. However, tokenization is an algorithmic process rather than interpretative - meaning that literals might best be taken literally. In theory, a proxy contains "only sense - no tense." There means that there are no verbal conjugations. (The fact that stories tend to stick to a single tense is a good indication that tense in many cases is a linguistic rather than narrative issue.) A proxy is made up a gerund-root, a preposition, a subject, and object. The gerund-root is a verbal abbreviation of the gerund. So when a verb is encountered, think gerund noun. By convention, the preposition is usually "of" thereby creating a bias in terms of subject and object. My use of the terms "subject" and "object" might not directly correspond their grammatical equivalents. I provide some examples and hope for the best
[detain.of_per=vic] means that the perpetrator (sub) detains the victim (obj)
[waste.of_vic=*money*] means that the victim (sub) wastes *money* (obj)
[surrender.of_inv=*self*] means that the involved (sub) (obj) surrenders
[borrow.of_vic=*money*] means that the victim (sub) borrows *money* (obj)
I made use of two "literals": *money* and *self*. There can be a dictionary containing many thousands of literals. On the other side of each line are the symbols, which are meant to connect literals to explain the specific nature of proxies. There are quite a lot of symbols. I will introduce a few of them here. The symbols are mostly (but not necessarily) English prepositions. When I ran out of reasonable English choices, I turned to French, German, and Norwegian prepositions.
The action takes place (symbols are usually preceded by "_") . . .
_per: according to somebody or something
_for: on behalf of somebody or something
_contra: contrary to somebody or something
_pour: for an intended outcome
_about: as a consequence of a situation
_amid: while something else is occurring
_upon: as a consequence of a trigger (hypothetical)
_ainsi: possibly doing something (hypothetical)
_enfin: resulting in an actual outcome
Specimen: ‘According to Janet, "Julie says that if she ever loses her job, she would have to borrow money from her parents to pay for rent. The way the market is, she might even do something risky like sell herself to make ends meet."' Obviously there is a reason why I post blogs rather than try to publish novels.
[include_role="Julie's parent"~as *people_parents*]
I believe the BERLIN code captures the behavioural gist of the specimen. Some might argue that job loss is quite a separate issue from borrowing money. The line actually says that for Julie, job loss and borrowing money go together. English sentences are linear both in structure and reasoning. But this is not necessarily so in relation to BERLIN lines of code. Symbols might be thought of as petals surrounding the centre; and in this centre is the proxy. The syntax is therefore conceptually much simper than an English sentence. The above narrative is interesting in that it contains various intangible details: Julie hasn't lost her job. Julie isn't the person giving information but rather a friend. Yet does the narrative contain no value? What if Julie is found dead in a motel room one day? Upon further investigation it was found that her supervisor was threatening to harass and possibly fire various women at their factory if they refused his sexual advances. The conversation between friends then fits a particular context although it isn't exact: Julie had a willingness to assume more risk to protect her financial situation. The circumstances surrounding the hypothetical are actually very material.
Those accustomed to data extraction will likely recognize the literals as "search keywords." BERLIN supports fairly deep questions such as the following: given cases of prostitution, what circumstances seem relevant? In the above, this might be a question about what literals can be found associated with "amid." In Julie's case, *prostitution* would be found "amid" *high_unemployment*. On the Elmira narrative database, I might ask the following: in situations of *forced_abduction* leading to *forced_confinement* using *movement_van*, what sort of surroundings "de" are associated? (The symbol "de" has the sense of "at" meaning "immediate surroundings.") I use the term "forced abduction" although I guess there aren't other forms. I just want "abduction" to appear whenever I enter "forced" on the computer-assisted search tool.
Going Beyond the Literal
There are good reasons to have a human extend the narrative beyond its literal meaning. This is because the meaning might not be entirely literal. For example, running into a burning building might be the literal event. Deliberately risking one's life to save a person inside the building is a reality beyond the literal. Some might take the position that the literal meaning is the only one that counts - that the inferences should be left to those reading the story. First of all, narrative sterility makes more sense if those reading the story are human. Secondly, one cannot safely infer what the absence of facts cannot necessarily substantiate. Thirdly, the contextual non-literal meaning is an important part of the story.
A teenager that says, "I hate Muslims" might be uttering a fact reflecting his personal animosity. But he might really be saying, "I intend to kill other students to make my case." He might never target Muslims specifically but people more generally in an effort to make a convoluted political statement. Recently in Ontario just a few weeks ago, there was teenager who entered a school wielding two kitchen knives. There is no way for this setting to unfold into a good story. She started attacking students - and also injured some teachers. People speculated "after the fact" (never before but always upon reflection) that this teenager struggled with bullying. According to her social media presence, she didn't even expect to survive. She expected to die as a result of her attack on the school. Preceding the meltdown, I suggest that this girl's sentiments are really much more important the literal stories. When the literal facts leave out critical details, these facts represent a fabrication of reality. "How are you?" "Oh, I'm fine." The interaction is so sterile as to be inaccurate. The narrative is suppressed by social customs and expectations. The narrative even if it is full of dark pits and broken paths is the one that should receive society's attention.
Conversion to Different Languages
I remember a movie called "Child 44" where the main character, Tom Hardy, becomes head of Moscow Homicide. ("Hardy" doesn't sound like a particularly Russian name, but he plays the part well.) I just sat back thinking, what a fabulous storyline: on one hand there is political rigidity meant to stabilize society; on the other hand, I am sure that Moscow has wildly interesting homicide cases. Converting between English and Russian is challenging. But I suspect that conversion between two BERLIN databases is probably quite easy because of commonalities in the linguistic interface. Researchers in one country could study the social circumstances of people in another assuming the narrative has been codified - and that literal, symbolic, and active equivalents are made available.
Big Data Alienation
Quantitative data is comforting. In data collection, analysis, and its presentation, I always have deliverables. Of course, the deliverables are quantitative. Perhaps our preoccupation with numbers can contribute to some level of alienation. We quantify even the quantitatively evasive and rob these things of their non-quantitative identities. Conceivably, a security analyst could stare at tables and streams of numbers all day "hunting" for terrorists. Every incident of mass murder is a reminder that terror doesn't seem to conform to the numbers. A student or perhaps a total stranger can enter a school and start shooting. The charts don't really help. Some then suggest that the problem might be related to an absence of data. In Canada, we sometimes form a special "Royal Commission" to investigate exceptionally troubling and challenging problems. What might a Royal Commission do to investigate the deaths of thousands of first-nations women? It will collect data of course. Maybe a great deal of it will be quantitative, further dragging stakeholders away from the narrative realm where perhaps the stories are able to take shape most naturally. Numbers are great to determine how to allocate funds in a budget; but the allocation of funds per se might not actually resolve problems. Having a lot of numbers without narrative certainly doesn't impair the distribution of money. The absence of narrative is a good indication that the numbers are alienated from reality; thus, spending might fail to address reality.
By no means is my intention to dismiss quantitative analysis. Quantitative analysis has a place. My biggest concern is how it has been the "only place" for such a long time. Data science has become synonymous with a quantitative approach. But doing such an analysis is really like "jumping to conclusions": e.g. bullying can be understood through the study of numbers. I suggest that this is actually an extremely radical suggestion - the assertion that complex social interactions can be understood without ever studying complex social interaction - by examining social metrics. I suppose it is a bit like studying "human needs" through their department store purchases. The numbers certainly represent a measurement of something; and the meaning of things can be subsumed under that measurement. The measurement commandeers the meaning. For example, if one examines drug abuse in a remote community through accounting data, "drug abuse" becomes an aspect of accounting. Similarly, economic data will sometimes show growth in an area of environmental catastrophe; true enough in the context of these metrics maybe there was "growth." With the objective to bolster economic metrics, the United States might invite terrorists to demolish and blow up buildings.
Quite a different approach is to spend some time at the narrative level - canvassing stories and studying their characteristics. I was interested to discover for example how often "bullying" occurs in movies where there is a critical mission - when the inability to conform to a control structure can lead to mission failure. While bullying is complex, certainly the expectation of being told what to do creates a narrative where people take the opportunity to tell others what to do. One can surmise from the narrative that the "expectation of conformity" might contribute to bullying irrespective of whether that expectation is justified. This then is a direction for quantitative "confirmation." The confirmation might be considered in relation to a specific context - e.g. in a paramilitary organization like the police. Approaching a situation both by its narrative and using quantitative analysis, one would gain both confirmation and context. Confirmation without context is actually nothing - or almost nothing. "Meaning" should never be reduced to a mathematical or statistical construct.
I don't have an exact time-table. I haven't worked out all of the details, but I certainly wouldn't be opposed to making Elmira open source. I might release it at closer to my retirement - maybe in about 15 years. That sounds like a long time. I'm giving the market an opportunity to introduce its own solutions. Encoded narrative can change how companies store information. It can also radically change the applied use of materials containing narrative. Stories become much more than just stories - they transform into resources for social research. News agencies could become research centres. Journalism might mutate into a field of intelligence. The shift making greater use of narrative can trickle down to all sorts of operations. However, I want to take the emphasis away from the software, which I believe few people would be able to use effectively. People should really open their minds before I open the code. Also, the code does't mean a great deal. The value is in the coded narrative - not the code of the narrative but my use of coding to narrate. That has been my point all along. The value isn't in the telling of a series of events - but in my recognition of relevance and attachment of meaning.