Subscribe to DSC Newsletter

The first computer program that I encountered mimicking or emulating human interaction through language was called "Eliza." The version that I knew ran on the Commodore PET. It communicated in English. Eliza made comments that made some sense but which indicated lack of understanding of the conversation. If a person mentions "mother," Eliza might pick up on this by asking for some elaboration; but then it would probably say something entirely disassociated from the conversation. One of my earliest projects after I learned how to program in a contemporary computer language was to create a type of Eliza that exhibited some basic structural awareness. I do not mean "self-awareness." I was thinking more on the lines of the computer on the Starship Enterprise that made use of language to carry out worthwhile functions. The conversation doesn't have to be particularly animated - or the personality of the AI all that lively. In my version of Eliza, which I called Rebecca, I devised a method to extract certain basic relationships between elements of data; the computer program was designed provide responses in the context of these relationships. Rebecca made use of a structural object that I called a "pentaneuron" meant to contain a 5-sided relationship. Here is a funny story. One day as I was working on the program, it responded in a manner that I considered reasonable but rather peculiar in syntax - much as young child would try to say something in English. I found the experience spooky because the response, although grammatically deficient, was so human - and the program seemed to be responding directly to what I was saying. I decided to shut the project down. I dismembered the code - enough to prevent me from reviving it. I replaced Rebecca with something less ethically problematic. This blog discusses some of the data structures connected with Rebecca's replacement.

About 15 years ago, I decided to write my own programming language: it made use of "is" and "has" relationships between elements of data. I called it Neura. I wrote it in Java. I still have the program today as a reference study or prototype. I gave Neura some basic programming features such as getting user input, printing output, handling loops and conditionals. On Windows and indeed most operating systems, it is common to hold data in files and folders. Neura did not interpret its resources as files and folders but rather as "is" (using files) and "has" (using folders) relationships. Perhaps the "is-has" and "file-folder" distinction is largely conceptual more than functional. Rebecca by the way also made use of "is" and "has" relationships, but I believe it also used specific variations in order to retain details of actions. A "is-has" action is quite different from containment of data elements: e.g. to have a walk (action) is quite unlike having a wallet (containment). Neura was designed to "understand" only aspects of logical containment. This ability gave the program characteristics similar to any operating system. In response to a question such as "Where is my photo album?" it wouldn't be difficult to make my notepad computer respond, "Your photo album is in the desktop folder." "Where is the demon?" "There is a demon in a jungle in a novel about children on an island." The computer operations are similar.

"Is" and "Has" Distinction

Depending on one's occupation - or perhaps preoccupation - data might be a straightforward topic or issue. Data might present itself in a highly processed form and rather importantly only as an "is." For example, "X = 2" is an "is" type of assertion. If I were to point out that "cat=Felix" this does not negate the possibility that "Toronto has house" "house has family" "family has cat" and finally that "cat is Felix." Expressed as files and folders, opening the file cat by accessing the path Toronto/house/family/cat might indicate "Felix." It adds to clarity to say Toronto/house/family/pet/cat/name where name is "Felix." In this assertion, a computer could suggest that Felix is something that can occupy a house. Without a more detailed understanding of facts, family/pet could mean that Felix occupies the pet or that the pet ate Felix. Rebecca distinguished between actions and logical containment - but not Neura.

In English, there are many aspects of "is" and "has." Suffice it to say that relational details can be complicated. Certain fields of study such as statistics make great use of data that is disassociated from any structural meaning. This disassociation is not necessarily problematic - e.g. if the relationships don't matter or if there is great preoccupation with uniformity. I use the term "preoccupation" because really in the immensity or enormity of existence where shades of meaning exist in structural details, it takes quite a one-sided single-mindedness to focus on a specific characteristic or property above all else. It's a bit like a person - after being invited to a buffet - piling up on donuts. It seems a little unusual to develop a narrow fetish amid such an abundance of data. The idea that statistics would have a relevant place in relation to big data is rather ironic because one would expect almost the exact opposite.

I am not the first or only person to raise "is" and "has" relationships in structural terms. As I noted earlier, perhaps many or all operating system invoke these structures through the use of files and folders. (I certainly claim no ownership either as a theorist or developer.) Once broken up in relational terms, a computer might be able to deal with a command such as, "Find a cat named Felix having a red hat." The relationships would have to be in place: Toronto/house/cat/name=Felix; cat/hat/colour=red. The computer might not know the meaning of colour or whether the cat that has a hat is the same cat that is associated with the name of Felix. Nonetheless, here we have some reasonably good building blocks to construct something elaborate and sophisticated.

Most computer users should know that folders can contain both files and folders. This means that it is possible to deposit long structural paths into other paths (folders into folders); moreover, different shades of "is" can be added to a folder. For example, cat/name=Felix; cat/weight=17 pounds; cat/temperament=poor; and cat/age=7 years. In English, it is possible to say that a cat "is" 7 years (old). In French, it is more customary to say that a cat "has" 7 years. I therefore make the rather confusing suggestion that an "is" can be a specific type of "has" relationship: Toronto/house/cat/name/value="Felix." Alternatively, rather than Felix, the value could be a symbolic link to another relational path, which is probably closer to the truth for the operating system. Meaning can be added to both "is" and "has" using standard notation: cat/weight.yesterday=17 pounds; cat/weight.desirable=13 pounds; and cat/weight.loss=4 pounds. The exact notation is perhaps mostly a matter of creativity and need.

"Has" asserts a hierarchical distribution of elements. It is possible to have a purely associative distribution or a mix of hierarchy and association: cat/happiness ([food] [mouse] [sofa] [catnip] [milk]) where events of no particular order might be associated with happiness for the cat. It might be difficult to use ordinary files and folders at this point. Perhaps it is necessary to make use of more advanced neural objects. Imagine a dendritic expansion from a folder into something like a reflex hub. Or, the individual associated events might be incorporated into an "is" file along with symbolic links to other structural paths: [[email protected]] [[email protected]] [[email protected]] [[email protected]] [[email protected]]. Where should the CPU go or stop going? For the sake of retaining logical cohesion, it is necessary for the processor to recognize resistance if it travels far from the main thread or line. However, for greater depth of meaning, the symbolic links that might be important should be brought closer to the main line.

Overcoming the Limits of Quantity

I find that people routinely talk about extracting data from plain language. While the general idea is sound, it is important to acknowledge the non-quantitative nature of the data that would be extracted. The expedition or journey might not lead to anything statistically meaningful. Interpreting natural language is at least conceptually - if one is focused on meaning rather than word or syntactical events - quite different from any form of quantitative analysis. Why? Well, the meaning of the data in natural language is embedded within the structure of the language and content of conversation - intrinsic to the instrument of conveyance and the items being conveyed. In contrast, the meaning of quantitative data is usually externally defined and contextually constrained. The data in the latter is free of structure because its relevance is actually regulated or imposed. If I study 10 years of sales data, I have myself 10 years of sales data. The data will not reveal itself as anything but sales data. It carries no meaning beyond the parameters set during and perhaps prior to collection. If I examine the relational structures associated with sales, I actually have no idea what to expect. I do not define a priori what I will extract.

There is a fundamental difference in the predefinition of meaning in quantitative analysis and its undefined conveyance in language. If we ask 100 retail shoppers why they have chosen to purchase a particular DVD title, they might all have different reasons. This is perfectly fine although perhaps difficult to quantify. On a quantitative basis, sales might be the only common thread available for analysis; of course, sales might have nothing to do with "why" people decided to purchase. I have many hundreds of DVDs. I don't recall ever basing my purchase on sales history. "Over 1 million copies were sold last month - I'm buying!" is not something I would ever say. The "is" and "has" details make it possible to gain substantive and materially relevant facts explaining the human reality surrounding the purchase of the DVD title. Quantitative data might just give the sales. The structural relationship between the elements of data represents an entirely different sphere of study. I think that most people would agree that structural data is not something that can be extracted and retained by accident - e.g. using a system intended to deal with quantitative data. The software has to be deliberately designed to articulate the meaning of phenomena within data. For me, one of the major challenges in the use of artificial intelligence in business involves the extraction of relational structures from common language. I consider quantitative methods useful for accounting, logistics, and budgeting - managing resources rather than expanding our understanding of the underlying business.

Views: 751

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service