Subscribe to DSC Newsletter

How to Cross Link Data and Why You Should Do So

The 3Vs model is the foundation of big data - Volume, Velocity, and Variety. It is used to express the key features of big data problems - for me, this is about to change. Big Data is not just about size, speed, or formats, the contextual enrichment is the most critical factor of how we unmask the best value out of data. How well you bring seemingly unrelated data together and identify the valuable connections determines how much power you unleash from your data. 

“Obviously, then, what is needed is not only people with a good background in a particular field, but also people capable of making a connection between item 1 and item 2 which might not ordinarily seem connected”. 

- Isaac Asimov “How Do People Get New Ideas?”

People tend to make connections between objects by instinct, and the instinct is backed by past experiences. Great ideas are often generated by identifying unusual and unique linkages. For example, Charles Darwin and Alfred Wallace associated the notion of overpopulation and their observation of how species differentiate among themselves to deduce the theory of evolution. Another example is how Walmart predicts the boost in sales of strawberry Pop-Tart upon the arrival of a hurricane based on its observation in historical sales data, compared to competitors who stock flashlights by instinct instead of using data. 

“The problem with that is that some of the most interesting insights go unnoticed, because you don’t have the ability to look at your customer data across silos (or haystacks, if you will). And being able to do that could lead you to questions you would otherwise have never thought to ask.”

- H.O. Maycotte “The Big Data Challenge Isn't The Needle In The Haystack -- It's The Haystack”

The ability to make cross-connections is the key to turning information into knowledge. Calculating the connection between any two data elements however, is an extremely complex task. This is the driver of BigObject Analytics development of what we call “Cross-Link.” 

Imagine: you have a blog and you cover a variety of topics. As an author, the more page views, the higher your sense of achievement and the better you are motivated. Here is the thing: how do you increase the engagement of your readers? You can easily get a basic sense of user behaviors from the web analysis, but what can be more interesting is to spot the hidden factors behind these logging sessions. You can import the weather data, for example, to see if readers’ interest varies under different weather conditions, calculate the correlations among the posts, and moreover, see if the correlation differs in different regions, by different genders, or over time.

This type of correlation analysis normally requires heavy table joins in a database system. When the data size doubles, the complexity of the computation can become even more complicated and time consuming. While more and more data is collected today, and multiple-dimensioned data is managed simultaneously, we need an efficient way to harness the complexity of cross-domain datasets. 

“I think we are all underestimating the impact of aggregated big data across many domains of human behavior, surfaced by smartphone apps.” 

- Marc Andreessen

The trend of “quantified self” is a substantial driver for this type of analysis. While smartphones today make-up plenty of data sources, it can best demonstrate how putting all this data together can assist individuals in every facet of life. An agile data analytic engine empowers the application to maneuver data analysis across different dimensions and provide profound insights. The result is the emergence of intelligence.

“Intelligence is not only the ability to reason; it is also the ability to find relevant material in memory and to deploy attention when needed.”

“Creativity is associative memory that works exceptionally well.” 

- Daniel Kahneman. <Thinking, Fast and Slow>

We have developed BigObject Analytics based on the philosophy of Cross-Link. We present it in a simple interface: BigObject Shell. With BigObject Shell you can unleash the tremendous power of data via elegant statements. The BigObject Analytics docker image is currently freely distributed. 

References:

Issac Asimov Asks, How Do People Get New Ideas?” MIT Technology Review, October 20, 2014

<Thinking, Fast and Slow> Daniel Kahneman

What Wal-Mart Knows About Customers' Habits” The New York Times, November 14, 2004

“The Big Data Challenge Isn't The Needle In The Haystack -- It's The Haystack” Forbes, January 20, 2015

Views: 813

Tags: BigObject, analytics, big, data, database, mining

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service