Home » Uncategorized

Towards a New Data Modelling Architecture – Part 2

Atomic Information Resource (AIR)

How do we design a data model, how do we connect data, how do we represent information, how do we store or retrieve them ? These are all fundamental questions in data modeling but there is a common key to unlock them. You have to start by defining a primitive information resource, and then understand how one can build complex information structures on top of these fundamental units. And this is because everything in nature or systems follow this kind of abstraction from the simple to the most sophisticated. There are patterns that recur at progressively smaller scales. There are fundamental building blocks that can build higher-order structures.

For more than thirty years, the data modeling world is dominated by records. Records in the form of a row in a table, or in a form of hierarchically structured XML/JSON documents, or in the form of property-graph nodes. Many consider the fundamental structure of RDF triplet, Subject-Predicate-Object, but this can be seen too as a form of a record that confines you to think in terms of a function-functor that maps information resources from a domain set to information resources of a range set. Neither the nature or reference mechanism of these resources, nor their linkage type are defined in a sufficient or efficient manner.

An alternative view on data modeling that can extend and enrich RDF is introduced in this article. It is based on AIR, analyzed with R3DM framework, and exemplified with AtomicDB.AIR is the oxygen that makes this database technology breath. It makes it alive and kicking.

AIR fits perfectly to the duality principle of R3DM conceptual model and the Everything is Represented with a Symbol corollary. There is no better example to think about this than digital representations as sequences of binary digits in the internal memory state of our machines. In a digital computer, everything is represented and addressed at the machine level with sequences of 0s and 1s. In order to represent information, Ron Everett managed to conceive in a similar way at a higher abstracted level an identification and addressing schema of information units. He wrapped atomic data types, such as a string or number, and made them the core of these units and he used a four-dimensional space to uniformly address, identify, bind and encode AIRs. Thus in AtomicDB each AIR unit is a self-referenced and uniquely identified item in a 4D space with sets of 4D references to otherAIRs for classification purpose, and embedded data values for querying purpose.

According to R3DM we have three layers of abstraction, the semantic, the sign, and the storage-data layer. The 4D reference type of AIR is the implementation of sign layer and it is bridging the semantic with the storage layer in the most semiotic way. This is a fully apprehensive act of responding to the fact that atomic data types and data structures cannot play sufficiently both the role of encoding and representing information. You have to make these two roles distinct. This is exactly what we manage in a beautiful way with these references. The symbolic layer is created in this 4D space and instead of having dissimilar atomic data types and abstract complex data types, you have uniform AIR units and aggregates of them e.g. collections, records, sets and multi-sets that are referenced in exactly the same way .

This is a completely, new, radical perspective on data modeling. It is a turning point and there has to be ample evidence to support claims that DBMS based on this are superior than their counterparts. We are determined to investigate, enhance, and apply that kind of database technology and connect it to the overall semiotic perspective of R3DM conceptual framework. It is possible that R3DM based on AIR can assimilate all other SQL and noSQL and SPARQL queries and data models in a simpler, more intuitive, faster, more secure, highly consistent, and in a large scale. This uniform structural symmetry based on AIR, both in terms of value representation and bi-directional relationships is perhaps the most innovative feature and what will hopefully make AIR the universal atomic information unit in the whole computer science field. If not, I am sure that many other similar paradigms in data modeling will be based on this model, because a whole new unexplored path is now open and unimaginable applications of this technology can be turned into a reality.

Read full (long) article here.